The purpose of this walk-through is to introduce and document the analyses developed for the study Speech Intelligibility: A Bayesian generalized linear, latent and mixed modeling approach (citation needed).
In this walk-through, Section 3 introduce various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, Section 3.1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. Section 3.2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. Section 3.3 explain the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. Section 3.4 illustrate the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, Section 3.5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.
The specific analysis for this study are elaborated from section Section 4 onwards. Particularly, Section 4 elaborates on the research gaps in the study of intelligibility. Section 5 introduces the research questions that guide the study. Section 6 explores the data and its implications. Section 7 thoroughly develop the methods to analyze the data. Lastly, Section 8 provides answers to the research question at hand.
Interludes
Bayesian inference
Theory
Bayesian inference is an approach to statistical modeling and inference that is primarily based on the Bayes’ theorem. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence (Everitt and Skrondal, 2010). The procedure consist on defining the model assumptions in the form of a likelihood for the outcome and a set of prior distributions for the parameters of interest. This allows the derivation of a set of posterior distributions for the parameters, from which the statistical inferences are derived 1. As an example, a simple linear regression model with a parameter \beta can be encoded under the Bayesian inference paradigm in the following form:
Bayesian inference
Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence (Everitt and Skrondal, 2010).
\begin{align}
P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) }
\end{align}
\tag{1}
where P( Y| \beta, X ) defines the likelihood of the outcome, which represents the probability distribution of the outcome Y, given the parameter \beta and covariate X. Hence, the likelihood is the probability distribution assumption that a set of observations occur, conditioned on the value of a set of parameters (Everitt and Skrondal, 2010). In other words, it describes the assumption about the underlying process that give rise to the data.
Likelihood
Probability distribution assumption that a set of observations occur, given the value of a set of parameters (Everitt and Skrondal, 2010).
P( \beta ) defines the prior distribution of the parameter \beta. A prior is a probability distribution that summarizes the available information about a parameter, prior to observing empirical data (Everitt and Skrondal, 2010). In other words, it reflects the knowledge about a parameter prior to observing any data.
Prior distribution
Probability distribution that summarize information about a random variable or parameter at a given time point, prior to observing empirical data (Everitt and Skrondal, 2010).
P( Y ) defines the probability distribution of the data, which represents the evidence of the observed empirical data.
As a result P( \beta | Y, X ), which denotes the posterior distribution of the parameter, describes the probability distribution of \beta after observing empirical data. Specifically, it combines the likelihood of the outcome and the parameter’s prior distribution, considering that empirical data have been observed.
Posterior distribution
Probability distribution that summarize information about a random variable or parameter after having obtained new information from empirical data (Everitt and Skrondal, 2010).
Before implementing the Bayesian inference procedures, two important concepts related to Equation 1 need to be understood. First, the evidence of the empirical data P(Y) serves as a normalizing constant. This is just another way of saying that the numerator in the equation is rescaled by a constant obtained from calculating P(Y). Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:
\begin{align}
P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\
\end{align}
\tag{2}
where \propto denotes the proportional symbol. This implies that the posterior distribution of \beta is proportional (up to a constant) to the multiplication of the outcome’s likelihood and the parameter’s prior distribution. This definition make the calculation of the posterior distribution easier, by separating the parameter’s updating process from the integration of new empirical data (this will be clearly seen in the code provided in Section 3.1.3).
Second, a dataset usually have multiple observations of the outcome Y and covariate X, in the form of y_{i} and x_{i}. Therefore, by law of probabilities and assuming independence among the observations, the likelihood of the full dataset can be rewritten as the product of all individual observation likelihoods. Consequently, Equation 2 can also be rewritten as follows:
\begin{align}
P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta )
\end{align}
\tag{3}
Estimation methods
Several methods within the Bayesian inference procedures can be utilized to estimate the posterior distribution of the parameter, and most of these fall into the category of Markov Chain Monte Carlo methods (MCMC). MCMC are methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt and Skrondal, 2010)2.
Markov Chain Monte Carlo (MCMC)
Methods to indirectly simulate random observations from probability distributions using stochastic processes (Everitt and Skrondal, 2010).
However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the grid approximation method. Through this method, an excellent approximation of the parameter’s posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in Section 3.1.3 to illustrate how the Bayesian inference works 3.
Grid approximation
Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values (McElreath, 2020).
How does it work?
A simple Bayesian linear regression model can be written in the following form:
\begin{align*}
y_{i} &= \beta \cdot x_{i} + e_{i} \\
e_{i} &\sim \text{Normal}( 0, 1 ) \\
\beta &\sim \text{Uniform}( -20, +20 )
\end{align*}
where y_{i} denotes the outcome’s observation i, \beta the expected effect of the observed covariate x_{i} on the outcome, and e_{i} the outcome’s residual in observation i. Furthermore, the model assumes the residual e_{i} is also normally distributed with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that \beta is uniformly distributed within the range of [-20,+20].
However, a more convenient generalized manner to represent the same linear regression model is as follows:
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, +20 )
\end{align*}
In this definition, the component of the Bayesian inference procedure detailed in Section 3.1.1 are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean \mu_{i} and standard deviation equal to one. Second, it is assumed \beta has a prior that is a normal distribution with mean zero and standard deviation equal to one. Additionally, the equations reveal that the mean of the outcome \mu_{i} is modeled by a linear predictor composed by the covariate x_{i} and its effect on the outcome \beta.
For illustration purposes, a simulated regression with n=100 observations was generated assuming \beta=0.2. Figure 1 shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of \beta. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with n=100 observations, the influence of the prior on the posterior distribution of \beta is negligible. Specifically, prior to observe any data, assuming that \beta could take any value within the range of [-20,+20] with equal probability (left panel of Figure 2) did not have a substantial impact on the distribution of \beta after empirical data was observed (right panel of Figure 2).
Prior to observing empirical data, assuming the parameter could take any value within within the range of [-20,+20] with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with Figure 3 through Figure 5, where three different types of priors are used to encode three levels of uncertainty about the parameter \beta.
user defined function: linear predictor for each candidate
4
calculation of the linear predictor for each candidate
5
user defined function: product of individual observation likelihoods
6
outcome data likelihood
7
prior 1: uniform prior distribution (min=-20, max=+20)
8
prior 2: normal prior distribution (mean=0, sd=0.5)
9
prior 3: normal prior distribution (mean=0.2, sd=0.05)
10
posterior distribution for each prior
First, the distribution depicted in Figure 3 assumes \beta \sim \text{Uniform}(-20, +20) (similar to what is observed in Section 3.1.3). The distribution does not restrain the effect of \beta to be more probable in any range within [-20, +20]. This type of distribution is commonly referred to as a non-informative prior. A non-informative prior distribution reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data (Everitt and Skrondal, 2010).
Non-informative priors
Prior distribution that reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data (Everitt and Skrondal, 2010).
Figure 3: Bayesian inference: posterior distributions with non-informative prior distribution.
Second, the distribution described in Figure 4 assumes \beta \sim \text{Normal}(0, 0.5). Consequently, the effect of \beta is more probable within the range [-1,+1], with less probability associated with parameter values outside this range. This is a an example of a weakly-informative prior distribution. Weakly informative priors are distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range (McElreath, 2020).
Weakly informative priors
Prior distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range (McElreath, 2020).
Figure 4: Bayesian inference: posterior distributions with weakly-informative prior distribution.
Third, the distribution described in Figure 5 assumes \beta \sim \text{Normal}(0.2, 0.05). As a result, the effect of \beta is more probable within the range [0.1,0.3], with less probability associated with parameter values outside this range. This is an example of an informative prior distribution. Informative priors are distributions that expresses specific and definite information about a parameter (McElreath, 2020).
Informative priors
Prior distributions that that expresses specific and definite information about a parameter (McElreath, 2020).
Figure 5: Bayesian inference: posterior distributions with informative prior distributions.
Lastly, regarding the influence of different priors on the posterior distributions, Figure 3 and Figure 4 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size n=100 is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, Figure 5 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.
What are Hyperpriors?
It is crucial to consider that for a greater modeling flexibility and a more refined representation of the parameters’ uncertainty, sometimes it is convenient to define a prior distribution in terms of hyperparameters and hyperpriors4. Hyperparameters refer to parameters that index a family of possible prior distributions for another parameter. On the other hand, hyperpriors are prior distributions for these hyperparameters (Everitt and Skrondal, 2010).
Hyperparameters
Parameters \theta_{2} that indexes a family of possible prior distributions for another parameter \theta_{1}(Everitt and Skrondal, 2010).
A simple example of the use of hyperpriors would be to define the regression model shown in Section 3.1.3 in the following form:
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\
v &\sim \text{Normal}(0, 3)
\end{align*}
where v define the hyperparameter for the parameter \beta, and its associated distribution define its hyperprior.
However, setting prior distributions through hyperparameters brings its own challenges 5. One notable challenge pertains to the geometry of the parameter’s sample space. This implies that specific prior probabilistic representations, defined by the same hyperparameters, exhibit simpler sample geometries compared to others 6. The re-parametrization of priors into such simpler sample geometries leads to the notion of non-centered priors. In this approach, a parameter’s prior distribution is expressed in terms of a hyperparameter, which is defined by a transformation of the original parameter of interest (Gorinova et al., 2019). By incorporating non-centered priors, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:
Non-centered priors
Expression of a parameter’s distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest (Gorinova et al., 2019).
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &= z \cdot \text{exp}(v) \\
v &\sim \text{Normal}(0, 3) \\
z &\sim \text{Normal}( 0, 1 )
\end{align*}
where z is a hyperparameter sampled independently from v, and the parameter of interest \beta is obtained as a transformation of the two hyperparameters. Figure 6 illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures have an harder time sampling from the former than the latter distributions.
A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line \left( y_{i} \in [-\infty, \infty] \right). The distribution is characterized by two independent parameters: the mean \mu and the standard deviation \sigma(Everitt and Skrondal, 2010). Thus, a random variable can take on values that are gathered around a mean \mu, with some values dispersed based on some amount of deviation \sigma, without any restriction. Importantly, by definition of the normal distribution, the location (mean) of the distribution does not influence its spread (deviation).
Figure 7 illustrates how the distribution of an outcome changes with different values of \mu and \sigma. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of \mu. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of \sigma. It is noteworthy that alterations in the mean \mu of the distribution have no impact on its standard deviation \sigma.
plotting normal distribution with different ‘mu’ and ‘sigma=1’
4
plotting normal distribution with ‘mu=0’ and different sigma’s
Figure 7: Normal distribution with different mean and standard deviations
The beta-proportion distribution
A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one \left( y_{i} \in [0, 1] \right). The distribution is characterized by two parameters: the mean \mu and the sample sizeM(Everitt and Skrondal, 2010). This implies that a random variable can take on values restricted within the unit interval, centered around a mean \mu, with some values being more dispersed based on the sample sizeM. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval (\mu \in [0,1]). Secondly, the mean and sample size parameters are no longer independent of each other.
Figure 8 illustrates how an outcome with a beta-proportion distribution changes with different values of \mu and M. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.
In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally bounded distributed outcomes, such as the entropy scores utilized in this study. Boundedness refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022). Neglecting the bounded nature of an outcome can lead to underfitting. Underfitting occurs when the statistical model fails to capture the underlying patterns or complexity of the data. In such scenario, a model may generate predictions outside the data range, which can be physically inconsistent or highly unlikely (Everitt and Skrondal, 2010), resulting in an inability to generalize its results when confronted with new data.
Boundedness
Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur (Lebl, 2022)
Underfitting
When the statistical model fails to capture the underlying patterns or complexity of the data (Everitt and Skrondal, 2010).
Linear Mixed Models
The ordinary LMM
An ordinary linear mixed model (LMM) is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes, Bolin, and Kelley, 2019). A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:
Ordinary linear mixed model (LMM)
Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates (Holmes et al., 2019).
where y_{ib} denotes the outcome’s i’th observation clustered in block b, and x_{i} denotes the covariate for observation i. Moreover, \beta denote the fixed slope of the regression. Furthermore, a_{b} denotes the random effects, and \varepsilon_{ib} defines the random outcome residuals. Furthermore, the residuals \varepsilon_{ib} are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, \beta is assumed to be normally distributed with mean zero and standard deviation equal to 0.5. Similarly, a_{b} is assumed to be normally distributed with mean zero and standard deviation equal to one.
The generalized LMM
A generalized linear mixed model (GLMM) are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee and Nelder, 1996). Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:
Generalized linear mixed model (GLMM)
Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates (Lee and Nelder, 1996).
Notice this representation explicitly highlights the four components of a Bayesian GLMM: the likelihood component, the linear predictor, the link function and the priors (McElreath, 2020). The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean \mu_{ib} and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter \beta, the covariate x_{i}, and the random effects a_{b}. The link function specifies the relationship between the mean of the outcome \mu_{ib} and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to Section 3.2.1). Lastly, the priors describe what is known about the parameters \beta and a_{b} before observing any empirical data.
GLMM components
Likelihood component
Linear predictor
Link function
Prior distributions
On the other hand, a beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:
Notice the representation also highlights the four components of a Bayesian GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean \mu_{ib} and sample size equal to 10. The linear predictor is still a linear combination of the parameter \beta, the covariate x_{i}, and the random intercepts a_{b}. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution \mu_{ib} \in [0,1] (refer to Section 3.2.2). Lastly, the prior assumptions for \beta and a_{b} are also declared.
Importance
Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field (see Boonen et al., 2021; Flipsen, 2006; Lagerberg et al., 2014). Furthermore, their significance also lies in their ability to model clustered outcomes. Clustering occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020). Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish statistical power or increase the likelihood of committing a type I error. Statistical power defines the model’s ability to reject the null hypothesis when it is false (Everitt and Skrondal, 2010). Type I error occurs when a null hypothesis is erroneously rejected (Everitt and Skrondal, 2010).
Clustering
Occurs when multiple observations arise from the same individual, location, or time (McElreath, 2020).
Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to Section 3.2.3 to understand the importance of considering the bounded nature of the data in the modeling process.
Measurement error in an outcome
What is the problem?
The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in Section 3.1.3, the probabilistic representation of measurement error in the outcome can be depicted as follows:
\begin{align*}
\tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, 20 )
\end{align*}
This representation effectively means that a manifest outcome \tilde{y}_{i} is assumed to be normally distributed with a mean equal to the latent outcome y_{i} and a measurement error s. The latent outcome y_{i} is also assumed to be normally distributed but with a mean \mu_{i} and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate x_{i} and its expected effect \beta. Lastly, prior to observing any data, \beta is assumed to follow a uniform distribution within the range of [-20, +20], representing a non-informative prior.
For illustrative purposes, a simulated outcome with n=100 observations was generated, assuming \beta=0.2, and a measurement error of s=2. Figure 9 shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the manifest outcome has a larger spread than the latent outcome depicted in the right panel. As a result, although \beta is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.
The estimation output confirms the previous hypothesis. The posterior distribution of \beta, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see Figure 10 and code output below). Furthermore, the code output shows the parameter’s posterior distribution can no longer reject the null hypothesis at confidence levels of 90\% and 95\%, indicating a reduced statistical power.
Figure 10: Bayesian inference: grid approximation on measurement error outcomes
How to solve it?
Latent variables can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A latent variable is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables (Everitt and Skrondal, 2010). Latent variables can be interpreted as hypothetical constructs, traits, or true variables that account for the variability that induce dependence in one or more manifest variables (Rabe-Hesketh, Skrondal, and Pickles, 2004a). This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes (Rabe-Hesketh et al., 2004a) (refer to Section 3.3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).
Latent variables
Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables (Everitt and Skrondal, 2010).
Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model (Luce, 1959; Thurstone, 1927) are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships (Hoyle, 2014). While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part (Rabe-Hesketh et al., 2004a).
Importance
It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a type II error, which occurs when a null hypothesis is erroneously accepted (Everitt and Skrondal, 2010).
Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker’s (latent) potential intelligibility contributes, in part, to the observed variability in the speaker’s (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.
Distributional departures
Heteroscedasticity
In the context of regression analysis, heteroscedasticity occurs when the variance of an outcome depends on the values of another variable. The opposite case is called homoscedasticity. An example of heteroscedasticity can be probabilistically represented as follows:
Heteroscedasticity
Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called homoscedasticity(Everitt and Skrondal, 2010).
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\sigma_{i} &= exp( \gamma \cdot x_{i} ) \\
\beta &\sim \text{Uniform}( -20, 20 ) \\
\gamma &\sim \text{Uniform}( -20, 20 )
\end{align*}
This representation implies that an outcome y_{i} is assumed normally distributed with mean \mu_{i} and a standard deviation \sigma_{i}. Furthermore, the mean and standard deviation of the outcome is explained by the covariate x_{i}, through the parameters \beta and \gamma. Lastly, prior to observing any data, \beta and \gamma are assumed to be uniformly distributed in the range of [-20,+20].
Figure 11 illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of n=100, and parameters \beta=0.2 and \gamma=1. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome’s variability could have and impact on the statistical hypothesis tests of \beta, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate \beta.
\begin{align*}
y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\
\mu_{i} &= \beta \cdot x_{i} \\
\beta &\sim \text{Uniform}( -20, 20 ) \\
\end{align*}
As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed homoscedastic, the parameter estimates not only become inefficient but also are not estimated closer to the true value, as seen in the output code below and in Figure 12.
In regression analysis, outliers are defined as observations that appear to deviate markedly from other sample data points in which they occur (Everitt and Skrondal, 2010). Although no unique probabilistic representation of outliers can be represented, a simple example can be illustrated with Figure 13. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.
Outlier
Observation that appear to deviate markedly from other sample data points in which it occurs (Everitt and Skrondal, 2010).
The intuition is proven correct when \beta is estimated using the same incorrect model used in Section 3.5.1. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in Figure 14.
As recommended by McElreath (2020), robust regression models can be used to deal with these types of distributional departures. Robust regression models are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model (Everitt and Skrondal, 2010). The procedure consist on modifying the statistical models to include traits that effectively make them robust to small departures from the distributional assumption, like heteroscedastic errors, or to the presence of outliers.
Robust regression models
A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model (Everitt and Skrondal, 2010).
Importance
It is known that dealing with heteroscedasticity and the identification of outlier through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately bias the parameter estimates, and even make them inefficient (McElreath, 2020). Bias refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt and Skrondal, 2010).
Bias
It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated (Everitt and Skrondal, 2010).
Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.
Research gaps
The process of decoding the words in a message, known as intelligibility, is crucial for understanding the intended meaning or purpose of a message. Speech intelligibility refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words (Freeman, Pisoni, Kronenberger, and Castellanos, 2017; Heuven, 2008; Whitehill and Chau, 2004).
In the research of speech intelligibility, studies that utilized entropy scores have extensively explored the statistical modeling of data clustering. For instance, Boonen et al. (2021) addressed the data clustering of entropy by employing LMM (refer to Section 3.3.1). However, to the best of the authors’ knowledge, no prior research has specifically addressed the modeling of the bounded nature of the entropy data. Section 3.2.3 and Section 3.3.3 present compelling arguments on the critical importance of considering both in the statistical modeling of data.
Furthermore, no study has attempted to construct a latent variable that unveils the speaker’s (underlying) potential intelligibility. Section 3.4.3 elaborates on the importance of consider measurement error in statistical models, and the role of latent variables to accomplish such task.
Hence, this study aims to fill these gaps by introducing the Bayesian beta-proportion linear, latent and mixed model, a type of generalized linear latent and mixed model (GLLAMM)(Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004). Section 3.1.6 elaborates on the relevance of Bayesian inference procedures for this study.
Research questions
This study aims to address three research questions that provide insights into modeling speech intelligibility using clustered and bounded entropy data.
Research question 1: Does the Bayesian beta-proportion linear, latent and mixed model accounts for all the observed variability in the data and yield better predictions?. More specifically, the study seeks to determine whether the model can improve the predictions of entropy data at different clustering levels (i.e., word, sentence, and speaker level) compared to the ubiquitous normal distributed model.
Research question 2: Can a measure of a speaker’s potential intelligibility be constructed?. In this regard, the study will leverage the principles laid down in Section 3.4.3, to aggregate listener assessment in the form of (manifest) entropy scores into a latent variable that might unveil the speaker’s (latent) potential intelligibility.
Research question 3: Can the speaker’s potential intelligibility be used to test specific research hypotheses?. Particularly, the study aims to demonstrate how the new analytic paradigm could be employed to examine whether child-related factors, such as children’s chronological age and hearing status, contribute to intelligibility.
Data
The data comprised the transcriptions of children’s spontaneous speech samples originally collected by Boonen et al. (2021). The analysis code is illustrated with the dataset, enabling readers to verify the correctness of the implementation. However, the data is not publicly available due privacy restrictions. Nonetheless, the data can provided by the corresponding author upon reasonable request.
Transcription task
For the transcription task, 105 listeners and 320 sentences, with a maximum of 11 words per sentence (speech samples), were randomly assigned to five blocks. Each block consisted of approximately 21 listeners who transcribed 64 sentences presented in a random order. As a result, a total of 47,514 transcribed words were generated from the original 2,263 words present in the speech samples.
Speech samples
Sentences with a minimum of 3 and a maximum of 11 words per sentence.
Entropy calculation
The 47,514 orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure similar to the one presented in Table 1. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the alignment procedure refer to Boonen et al. (2021).
Next, the aligned transcriptions were aggregated by listener, yielding 2,263 entropy scores, one score per word. The entropy scores were calculated following Shannon’s entropy formula (1948):
where H_{wsib} denotes the entropy score bounded within the continuous interval between zero and one \left( H_{wsib} \in [0,1] \right), with w denoting the word number, s the sentence number, i the speaker number, and b the block number. Moreover, K describes the number of different word types within transcriptions, p_{k} denotes the proportion of word types within transcriptions, and J defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions J corresponds with the number of listeners per block, i.e., 21 listeners. Furthermore, p_{k} is calculated as follows:
where 1(T_{jk}) denotes an indicator function that takes the value of one when the word type k is present in the transcription j.
The resulting entropy scores served as the outcome variable, capturing the agreement or disagreement among listeners’ word transcriptions. High degree of agreement between transcriptions indicate higher intelligibility, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower intelligibility, and yield higher entropy scores. (Boonen et al., 2021; Faes, De Maeyer, and Gillis, 2021).
Entropy interpretation
High degree of agreement between transcriptions indicate higher intelligibility, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower intelligibility, and yield higher entropy scores. (Boonen et al., 2021; Faes et al., 2021)
Table 1: Hypothetical alignment of word transcriptions and entropy score calculations for the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners \left( s=1, i=1, b=1, J=5 \right). [B] represent a blank space, and [X] an unidentifiable speech. Extracted from Boonen et al. (2021), and slightly modified for illustrative purposes.
Transcription
Words
Number
1
2
3
4
5
1
de
jongen
ziet
een
kikker
the
boy
sees
a
frog
2
de
jongen
ziet
de
[X]
the
boy
sees
the
[X]
3
de
jongen
zag
[B]
kokkin
the
boy
saw
[B]
cook
4
de
jongen
zag
geen
kikkers
the
boy
saw
no
frogs
5
de
hond
zoekt
een
[X]
the
dog
searches
a
[X]
Entropy
0
0.3109
0.6555
0.8277
1
It is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in Table 1 were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners (w=\{2,4,5\}, s=1, i=1, b=1, J=5).
For the second word, the first four listeners identified the word type jongen(T_{j1}), while the last identified the word type hond(T_{j2}). Therefore, two word types were identified (K=2), with proportions equal to \{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}, and entropy score equal to:
H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109
For the fourth word, two listeners identified the word type een(T_{j1}), one listener the word type de(T_{j2}), and another the word geen(T_{j3}). It is important to highlight the presence of a blank space [B] in transcription number three. A blank space [B] is a symbol that defines the absence of a word in a space were a word was expected, as compared with other transcriptions. Notice that for calculation purposes, because the blank space was not expected in position three it was considered as a different word type (T_{j4}). Therefore four word types were registered (K=4), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \} and entropy score equal to:
H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277
Lastly, for the fifth word, each listener transcribed a different word. It is important to highlight that when a listener did not managed to identify a complete word, or part of it, (s)he was instructed to write [X] in that position. However, for the calculation of the entropy score, if more than one listener marked an unidentifiable word with [X], each one of them was considered a different word type. This was done to avoid the artificial reduction of the entropy score, as [X] values already indicate the lack of the word’s intelligibility. Consequently, five word types were observed, T_{j1}=kikker, T_{j2}=[X], T_{j3}=kokkin, T_{j4}=kikkers, T_{j5}=[X] (K=5), with proportions equal to \{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}, and entropy score equal to:
As expected, the data exploration reveals two significant features of the entropy scores: clustering and boundedness (refer to Section 3.2.3 and Section 3.3.3). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit boundedness as they can only take on values within the continuous interval between zero and one, particularly H_{wsib} \in [0,1] (see Figure 15 showing three randomly selected children).
require(rethinking)set.seed( 12345 )children =sample( 1:32, size=3, replace=F ) par(mfrow=c(1,3))for( i in children ){ idx_child = data_H$childID == ihist( data_H$Hwsib[idx_child], xlab='', xlim=c(-0.2,1.2), breaks=15, col=rgb(0,0,0,alpha=0.2),main=paste0('Child ', i, ', all sentences') )abline( v=c(0,1), col='gray', lty=2 )}par(mfrow=c(1,1))
1
package requirement
2
seed for replication and random sample of three children
3
histogram plot for all sentences of childID
Figure 15: Entropy scores distribution: all sentences of selected children
Additionally, the data shows the 320 children’s speech samples consists of sentences with a minimum of 3 and a maximum of 11 words per sentence (\mu=7.07, \sigma=1.06), where most of the speech samples have between 5 and 9 words per sentence (see Figure 16).
speech_samples =data.frame( speech_samples )hist(speech_samples$Freq, breaks=20, xlim=c(2, 12),main='', xlab='words per sentence')
1
histogram of words per sentences
Figure 16: Histogram of words per sentences in the speech samples
Code
psych::describe( speech_samples$Freq )
1
statistical descriptors for the speech samples
vars n mean sd median trimmed mad min max range skew kurtosis se
X1 1 320 7.07 1.06 7 7.04 1.48 3 11 8 0.19 1.45 0.06
Moreover, the data comprised 16 normal hearing children (NH, hearing status category 1) and 16 hearing impaired children, with cochlear implant (HI/CI, hearing status category 2). Most of the NH and HI/CI children had 82 and 85 months of age, respectively. Additionally, the minimum chronological age registered in the sample is 68 months.
Code
d_mom =unique( data_H[,c('childID','HS','A')])with( d_mom, table( A, HS ) )
1
unique hearing status and chronological age per child
2
number of children per chronological age and hearing status
After examining the data, it is emphasized that no observations were excluded from the modeling process based on preliminary or univariate procedures. As it is detailed in Section 3.5, the identification of outliers, through univariate procedures is prone to the erroneous exclusion of valuable information, which can ultimately bias the parameter estimates. Consequently, their identification was performed within the context of the proposed models, as recommended by McElreath (2020).
Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:
List of 12
$ N : int 2263
$ B : int 5
$ I : int 32
$ U : int 10
$ W : int 11
$ cHS : int 2
$ bid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ cid : int [1:2263] 1 1 1 1 1 1 1 1 1 1 ...
$ uid : int [1:2263] 1 1 1 1 1 1 1 1 2 2 ...
$ HS : int [1:2263] 2 2 2 2 2 2 2 2 2 2 ...
$ Am : int [1:2263] 17 17 17 17 17 17 17 17 17 17 ...
$ Hwsib: num [1:2263] 0.057 0.279 0.279 0.461 0.113 ...
Methods
This subsection presents the probabilistic formalism for the statistical models utilized in the current study. It also details the estimation procedure employed to fit the models, and the criteria used to asses the quality of the Bayesian inference results. Lastly, it presents the set of fitted models, and the methodology for selecting among them.
The Bayesian inference procedures are elaborated following closely the When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist)(Depaoli and Schoot, 2017). The WAMBS checklist is a questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis.
WAMBS checklist
Questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis (Depaoli and Schoot, 2017).
Models
Normal GLLAMM
A generalized linear, latent and mixed model (GLLAMM)(Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004) is a unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between latent variables and one or more covariates. Under this framework, the normal linear, latent and mixed model is composed of two parts: a response and a structural equation part. The probabilistic representation of the response part can be stated as follows:
Generalized linear, latent and mixed model (GLLAMM)
Unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between latent variables and one or more covariates (Rabe-Hesketh et al., 2004a, 2004b, 2004c; Skrondal and Rabe-Hesketh, 2004).
where H_{wsib} denotes the manifest word-level entropy scores, for sentence s and child i in block b. Furthermore, the \mu_{ib} denotes the mean of the outcome and \sigma_{i} its standard deviation. Additionally, a_{b} denotes the block random effect, and SI_{i} denotes the speaker’s (latent) potential intelligibility score. On the other hand, the structural equation part, which relates the children’s potential intelligibility to the child related factors, can be represented probabilistically in the following manner:
where HS_{i} defines the hearing status and A_{i} the chronological age in months, of children i. \alpha defines the general intercept, and \alpha_{HS[i]} denotes the potential intelligibility for different hearing status groups. Moreover, \beta_{A,HS[i]} denotes the evolution of potential intelligibility per unit of chronological age for each hearing status group. Lastly, e_{i} denotes the unexplained potential intelligibility differences between children, and u_{i} the average unexplained potential intelligibility variability among sentences for each children, defined as u_{i} = \sum_{s=1}^{S} u_{si}/S with S denoting the total number of sentences per child.
Five features can be noticed in the previous probabilistic representations. First, the full definition of the model shows four of the five components of a Bayesian GLLAMM: the response model, with its likelihood component, linear predictor and link function; and the structural equation part. The fifth component, priors and hyperpriors, is fully developed in Section 7.2. Equation 6 reveals that for the likelihood component of the response model, the outcome is assumed to be normally distributed with mean \mu_{ib} and standard deviation \sigma_{i}. Moreover, The linear predictor of the response model is a linear combination of the parameters a_{b} and SI_{i} with a direct link function, implying the absence of a transformation of the linear predictor. Lastly, Equation 7 shows that the structural equation part relates the potential intelligibility with the child related factors.
GLLAMM components
Response model, random component
Response model, linear predictor
Response model, link function
Structural equation model
Priors and Hyperpriors
Second, Equation 6 reveals the response model representation can consider different standard deviation per child (indicated by the subscript i). These trait effectively make the regression models robust to small departures from the normal distributional assumption or to the presence of outliers (refer to Section 3.5).
Third, Equation 6 also shows that in the response model representation, the potential intelligibility has a negative linear relationship with the average entropy scores. This feature makes explicit the inverse relationship between the potential intelligibility and entropy scores detailed in Section 6.2.
Fourth, Equation 7 shows the chronological age is centered around a minimum age \bar{A}. The centering procedure is done to facilitate the interpretation of the parameters (Everitt and Skrondal, 2010). Specifically, to the centering is done avoid interpreting the parameter estimates outside the range of the chronological ages available in the data.
Fifth and last, Equation 7 shows the model assumes the potential intelligibility has two sources of unexplained variability. One from the specific selection of children e_{i}, and another from the specific selection of sentences for each children u_{i}. While e_{i} assumes that different children inherently have different levels of potential intelligibility, u_{i} assumes that different sentences measure differently the potential intelligibility of children. The latter is assumed to be due to the different word difficulties and their interplay among each other within the sentence.
Beta-proportion GLLAMM
In a similar fashion, the proposed beta-proportion linear, latent and mixed model was composed of two parts: a response and a structural part. The response part assumed the following probabilistic structure:
On the other hand, the structural equation part is defined as in Equation 6, i.e., in the same way as in the normal linear, latent and mixed model.
Similar to the normal GLLAMM, the probabilistic representation of the beta-proportion GLLAMM has three main features. First, the representation also highlight four of the five components of a GLLAMM. For the likelihood component of the response model, it is assumed the outcome is beta-proportional distributed with mean \mu_{ib} and sample size M_{i}. The linear predictor of the response model is a linear combination of the parameters a_{b} and SI_{i}, with an inverse-logit link function \text{logit}^{-1}(x) = exp(x) / (1+exp(x)). Lastly, the structural equation part also relates the children’s potential intelligibility with the child related factors.
Second, Equation 8 reveals the response model representation also considers different sample sizes per child (indicated by the subscript in M_{i}). As in the normal GLLAMM, models with this trait were dubbed robust regression models (refer to Section 3.5).
Third and last, equation Equation 8 shows in the response model representation, the children’s potential intelligibility has a negative (non)linear relationship with the entropy scores. This feature still makes explicit the inverse relationship between the children’s potential intelligibility and entropy scores, as seen in Section 6.2.
Prior distributions
In this section, prior predictive simulations are conducted for all parameters that necessitate prior assumptions. Prior predictive simulation is a procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data (McElreath, 2020).
Prior predictive simulations
Procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data (McElreath, 2020).
The procedure involved simulating parameters independently in either the structural or response part of the models. These parameters were later transformed into the entropy score scale through the response part. The specific steps employed are found in the adjacent code within the parameter’s corresponding sections 7.
The probabilistic representations for the normal GLLAMM show that seven parameters require assumption of a prior distribution (refer to Section 7.1.1). Similarly, the representation for the beta-proportion GLLAMM requires an additional seven prior distribution assumptions (refer to Section 7.1.2).
The parameters found in the structural part of both models can be enumerated as follows:
the general intercept \alpha,
the hearing status effects \alpha_{HS[i]},
the chronological age per hearing status effects \beta_{A,HS[i]},
the child variability e_{i},
the child-sentence average variability u_{i}
Furthermore, the parameters present in the response part of either model are:
the random block effect a_{b}, present in both models,
the standard deviation \sigma_{i}, present only in the normal GLLAMMM, and
the sample size M_{i}, present only in the beta-proportion GLLAMM,
Lastly, after the establishment of all the prior distributions, a prior predictive simulations are also provided for the children potential intelligibility SI_{i}.
Intercepts \alpha
The parameter’s prior distribution for the normal and beta-proportion GLLAMM is described in Equation 9. The distribution serves as an informative prior, which aims to set the scale for the speaker’s (latent) potential intelligibility, a requirement often seen in latent variable models (Depaoli, 2021) (refer to Section 3.1.4).
The left panel of Figure 18 shows the prior restricts \alpha to be mostly between the range of [-0.1, 0.1]. The implications for the scale of potential intelligibility are: (1) no particular bias towards a positive or negative intercept is evident, and (2) the scale is initially centered around zero with high certainty.
On the other hand, the right panel of Figure 17 demonstrate that when the \alpha prior is transformed to the entropy scale under the normal GLLAMM model, the model anticipates a concentration of data around lower levels of entropy, and beyond the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.
Normal GLLAMM general intercept prior distribution definition: intelligibility scale
4
Normal GLLAMM general intercept prior distribution definition: entropy scale
5
Normal GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale
6
Normal GLLAMM general intercept prior distribution density plot: bounded entropy scale
Figure 17: Normal GLLAMM, general intercept prior distribution: intelligibility and entropy scale
In contrast, the right panel of Figure 18, demonstrate that when the \alpha prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores among the hearing status groups are expected by the model.
Beta-proportion GLLAMM general intercept prior distribution definition: intelligibility scale
4
Beta-proportion GLLAMM general intercept prior distribution definition: entropy scale
5
Beta-proportion GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale
6
Beta-proportion GLLAMM general intercept prior distribution density plot: bounded entropy scale
Figure 18: Beta-proportion GLLAMM, general intercept prior distribution: intelligibility and entropy scale
Hearing status effects \alpha_{HS[i]}
The prior distribution of the parameters for the normal and beta-proportion are described in Equation 10. For each of the two hearing status categories present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to both categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.
The left panel of Figure 19 reveal a weakly informative prior that restricts the range of probability of \alpha_{HS[i]} only between [-0.6, 0.6] (refer to Section 3.1.4). This implies that no particular bias towards positive or negative values for the potential intelligibility of different hearing status groups is present in the priors.
However, the right panel of Figure 19 demonstrate that when the prior of \alpha_{HS[i]} is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, and outside the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.
Normal GLLAMM hearing status effects prior distribution definition: intelligibility scale
4
Normal GLLAMM hearing status effects prior distribution definition: entropy scale
5
Normal GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale
6
Normal GLLAMM hearing status effects prior distribution density plot: bounded entropy scale
Figure 19: Normal GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale
Conversely, the right panel of Figure 20, demonstrate that when the \alpha_{HS[i]} prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores are expected by the model.
Beta-proportion GLLAMM hearing status effects prior distribution definition: intelligibility scale
4
Beta-proportion GLLAMM hearing status effects prior distribution definition: entropy scale
5
Beta-proportion GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale
6
Beta-proportion GLLAMM hearing status effects prior distribution density plot: bounded entropy scale
Figure 20: Beta-proportion GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale
Chronological age per hearing status \beta_{A,HS[i]}
The prior distribution of \beta_{A,HS[i]} for the normal and beta-proportion GLLAMM is described in Equation 11. For each of the two hearing status categories present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to both categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider the interaction of chronological age and hearing status (\beta_{A} instead of \beta_{A,HS[i]}), the same prior was utilized (refer to Section 7.6).
The left panel of Figure 21 shows a weakly informative prior that restricts \beta_{A,HS[i]} to be within the range of [-0.2, 0.2]. This implies that no particular bias towards positive or negative effects is present for the evolution of potential intelligibility due to chronological age per hearing status group.
The right panel of Figure 21, on the other hand, demonstrate that when the prior of \beta_{A,HS[i]} is transformed to the entropy scale, the model anticipates a concentration of lower entropy values. More importantly, the model expects entropy scores beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.
Normal GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale
4
user defined function: effects per chronological age
5
Normal GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale
6
Normal GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale
7
Normal GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scale
Figure 21: Normal GLLAMM, chonological age per hearing status effects prior distribution: intelligibility and entropy scale
In contrast, the right panel of Figure 22, demonstrate that when the prior of \beta_{A,HS[i]} is transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative effects for the evolution of potential intelligibility per hearing status group is present.
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale
4
user defined function: effects per chronological age
5
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale
7
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scale
Figure 22: Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: intelligibility and entropy scale
Child differences e_{i}
The parameter e_{i} was defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of e_{i} for the normal and beta-proportion GLLAMM is described in Equation 12. For each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that potential intelligibility differences between children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.
The left panel of Figure 23 shows a weakly informative prior that restricts the potential intelligibility differences to be in a range of [-1.5, 1.5]. This implies that differences in potential intelligibility between children can be as large as three units of measurement.
However, the right panel of Figure 23 demonstrate that when the prior of e_{i} is transformed to the entropy scale under the normal GLLAMM, the model anticipates again a concentration of data around lower levels of entropy, and even it expects the differences to go beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM child differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM child differences prior distribution definition: entropy scale
7
Normal GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM child differences prior distribution density plot: bounded entropy scale
Figure 23: Normal GLLAMM, children differences prior distribution: intelligibility and entropy scale
Conversely, the right panel of Figure 24, demonstrate that when the prior of e_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between children are expected at the entropy scale level.
standard deviation hyperparameter prior distribution definition
5
Beta-proportion GLLAMM child differences prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM child differences prior distribution definition: entropy scale
7
Beta-proportion GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM child differences prior distribution density plot: bounded entropy scale
Figure 24: Beta-proportion GLLAMM, children differences prior distribution: intelligibility and entropy scale
Sentence-child average differences u_{i}
The parameter u_{i} was defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of u_{i} for the normal and beta-proportion GLLAMM is described in Equation 13. For each sentence within children present in the data there is one prior, which are later aggregated across all sentences S, as defined in Section 7.1.1. Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that the average potential intelligibility differences among sentences within children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.
The left panel of Figure 25 shows a weakly informative prior that restricts the u_{i} to be in a range between [-0.5, 0.5]. This implies that average differences in potential intelligibility among sentences within children can be as large as one units of measurement.
The right panel of Figure 25, in contrast, demonstrate that when the prior of u_{i} is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome. Again, this prior may initially appear inappropriate but other priors hide even more undesirable assumptions.
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM sentence-child average differences prior distribution definition: entropy scale
7
Normal GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale
Figure 25: Normal GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale
, Conversely, the right panel of Figure 26, demonstrate that when the prior of u_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between children are expected at the entropy scale level.
standard deviation hyperparameter prior distribution definition
5
Beta-proportion GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale
6
Beta-proportion GLLAMM sentence-child average differences prior distribution definition: entropy scale
7
Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale
8
Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale
Figure 26: Beta-proportion GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale
Random block effect a_{b}
The parameter a_{b} was also defined in terms of hyperparameters (refer to Section 3.1.5). Specifically, the prior distribution of a_{b} for the normal and beta-proportion GLLAMM is described in Equation 14. For each block present in the data there is one prior (refer to Section 6.3). The same prior is applied to all blocks with a common mean and standard deviation defined by the hyperparameters. This choice implies that block differences are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.
The left panel of Figure 27 shows a weakly informative prior restricting a_{b} to be in a range between [-1.5, 1.5], implying no particular bias towards positive or negative differences between blocks are present in the priors.
Nevertheless, the right panel of Figure 27 demonstrate that when the prior of a_{b} is transformed to the entropy scale under the normal GLLAMM, the model anticipates a concentration of data around lower levels of entropy. Furthermore, the model expects the differences to go beyond the feasible range of the outcome.
standard deviation hyperparameter prior distribution definition
5
Normal GLLAMM block differences prior distribution calculation: intelligibility scale
6
Normal GLLAMM block differences prior distribution definition: entropy scale
7
Normal GLLAMM block differences prior distribution density plot: unrestricted intelligibility scale
8
Normal GLLAMM block differences prior distribution density plot: bounded entropy scale
Figure 27: Normal GLLAMM, block differences prior distribution: intelligibility and entropy scale
In contrast, the right panel of Figure 28, demonstrate that when the prior of a_{b} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between blocks are expected at the entropy scale level.
The prior distribution of \sigma_{i} for the normal GLLAMM is described in Equation 15. For each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all \sigma_{i}. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this robust feature, and estimated one common variability for all children (\sigma instead of \sigma_{i}), the same prior was utilized (see Section 7.1.1 and Section 7.6).
The left panel of Figure 29 shows the weakly informative prior expects \sigma_{i} to be possible only in a positive range and more likely between [0, 1.5], as it is required for variability parameters (Depaoli, 2021). However, the right panel of Figure 29 shows that when the prior of \sigma_{i} is transformed to the entropy scale, the model expect that certain scores fall beyond the feasible range of the outcome. This assumption may initially appear inappropriate. However, considering again the probabilistic representation of the model, other priors under the normal GLLAMM hide even more undesirable assumptions.
Normal GLLAMM word-level entropy variability prior distribution definition: parameter scale
4
Normal GLLAMM word-level entropy variability prior distribution definition: entropy scale
5
Normal GLLAMM word-level entropy variability prior distribution density plot: restricted parameter scale
6
Normal GLLAMM word-level entropy variability distribution density plot: bounded entropy scale
Figure 29: Normal GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale
Sample size M_{i}
The prior distribution of M_{i} for the beta-proportion GLLAMM is described in Equation 16. Similar to the previous parameter, for each of the children present in the data there is one prior (refer to Section 6.3). Notably, the same prior is applied to all M_{i}. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this robust feature and estimated one common variability for all children (M instead of M_{i}), the same prior was utilized. (see Section 7.1.2 and Section 7.6).
The left and right panel of Figure 30, demonstrate the prior of M_{i} expects the parameters to be in a reasonable positive range between [0, 5], and within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.
After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed. The prior predictive simulation for SI_{i} under the normal and beta-proportion GLLAMM can be described as in Equation 17:
The left panel of Figure 32 shows the prior restricts the potential intelligibility of children to be mostly between [-6, 6], implying no particular bias towards positive or negative potential intelligibility is present in the priors.
However, the right panel of Figure 31, demonstrate that when the prior of SI_{i} is transformed to the entropy scale under the normal GLLAMM, the model anticipates prediction of entropy scores beyond its feasible range.
hearing status effects prior distribution definition
5
chronological age per hearing status effects prior distribution definition
6
mean hyperparameter prior distribution definition
7
standard deviation hyperparameter prior distribution definition
8
child differences prior distribution definition
9
sentence-child differences effects prior distribution definition
10
Normal GLLAMM potential intelligibility prior distribution calculation: intelligibility scale
11
Normal GLLAMM potential intelligibility prior distribution definition: entropy scale
12
Normal GLLAMM potential intelligibility prior distribution density plot: unrestricted intelligibility scale
13
Normal GLLAMM potential intelligibility prior distribution density plot: bounded entropy scale
Figure 31: Normal GLLAMM, potential intelligibility distribution: intelligibility and entropy scale
In contrast, the right panel of Figure 32, demonstrates that when the prior of SI_{i} is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data at the extreme levels of entropy, with a lower probability for scores around mid-levels. Furthermore, the model does not expect data beyond the feasible range of the outcome.
The models were estimated using the software R version 4.2.2 (R Core Team, 2015) and STAN version 2.26.1 (Stan Development Team., 2021). R was utilized to handle the data and produce the outputs for the models. STAN was utilized to access the Bayesian estimation procedure. The procedure implemented in STAN was the Hamiltonian Monte Carlo (HMC), specifically, the No-U-Turn Sampler (NUTS)(Hoffman and Gelman, 2014)8.
Software
Four Markov chains were implemented for each parameter. Each chain had distinct starting values. Due to the selected estimation procedure (HMC-NUTS), no burn-in or thinning procedure was necessary. Nevertheless, the procedure did required a warm-up phase. Consequently, each chain was run for 4,000 iterations, where the first 2,000 served as a warm-up phase and the remaining 2,000 were considered effective samples from the posterior distribution.
Stationarity, convergence and mixing evaluation
To evaluate the Markov chains’ performance in terms of achieving stationarity, convergence, and good mixing, visual inspections of the trace, trace-rank, and autocorrelation plots (ACF) were conducted, as recommended by McElreath (2020). Furthermore, the assessment of convergence was also supported by the potential scale reduction factor statistics (\hat{R}), developed by Gelman and colleagues (2014). Convergence was considered to be achieved if \hat{R} \leq 1.05.
Posterior information evaluation
To assess whether the parameter’s posterior distribution has been generated with a sufficient number of uncorrelated sampling points, the posterior distribution density is inspected. The number of uncorrelated sampling points indicates the amount of information held by the posterior distribution. Additionally, this assessment is complemented by the effective sample size statistics (n_{\text{eff}}) developed by Gelman and colleagues (2014).
Fitted models
Twelve statistical models were derived from the general descriptions of the normal and beta-proportion linear, latent and mixed models outlined in Section 7.1.1 and Section 7.1.2. The models differed in the response likelihood component, link function, and the structural equation part. For a comprehensive overview of the parametrization for the entire set of model refer to Table 2.
Of particular interest is the comparison of models one and seven. The selected model resulting from this comparison aims to address Research question 1 and Research question 2. Furthermore, the comparison and selection among all competing models, with particular interest in models six, nine and twelve, collectively serve to address Research question 3.
Table 2: Parametrization of fitted models.
Response
Robust
Block
Fixed effects
Model
distribution
model
effects
\beta_{HS[i]}
\beta_{A}
\beta_{A,HS[i]}
1
Normal
No
Yes
No
No
No
2
Normal
No
Yes
Yes
Yes
No
3
Normal
No
Yes
Yes
No
Yes
4
Normal
Yes
Yes
No
No
No
5
Normal
Yes
Yes
Yes
Yes
No
6
Normal
Yes
Yes
Yes
No
Yes
7
Beta-prop.
No
Yes
No
No
No
8
Beta-prop.
No
Yes
Yes
Yes
No
9
Beta-prop.
No
Yes
Yes
No
Yes
10
Beta-prop.
Yes
Yes
No
No
No
11
Beta-prop.
Yes
Yes
Yes
Yes
No
12
Beta-prop.
Yes
Yes
Yes
No
Yes
The following tabset panel shows the STAN code of all fitted model. The code show commentaries for each section of the model. Furthermore, the models are implemented using non-centered priors (refer to Section 3.1.5).
The current research used the Information-Theoretic Approach(Anderson, 2008; Chamberlain, 1965) for model selection and inference. The Information-Theoretic Approach is composed of three steps: (1) the expression of the research hypothesis into statistical models, (2) the selection of the most plausible models, and (3) the production of inferences based on one or multiple selected models. The first step of the approach is developed in Section 7.1, the second in Section 7.6, and the third in Section 8.
Information-Theoretic Approach
Approach to model selection and inference composed of three steps:
Express research hypothesis into models,
Select most plausible models,
Produce inferences based on selected models
Related to second step, the selected criteria for choosing among competing models were the widely applicable information criterion (WAIC)(Watanabe, 2013) and the pareto-smoothed importance sampling cross-validation criterion (PSIS), developed by Vehtari and colleagues (2021). The use of WAIC and PSIS is justified based on two fundamental characteristics of the criteria. First, WAIC and PSIS incorporates all the information available in the posterior distribution of the parameters. This effectively integrates all the uncertainty inherent in the parameter estimates. Second, the criteria provides the most accurate approximation for the cross-validated deviance (McElreath, 2020), which serves as the closest estimate to the Kullback-Leibler divergence (Kullback and Leibler, 1951). The Kullback-Leibler divergence measures the degree to which a model accurately represents the actual distribution of the data.
Reasons to use WAIC and PSIS
The criteria:
Incorporates all the uncertainty of the posterior distribution, and
Evaluates the degree to which each model deviates from achieving perfect predictive accuracy for the data.
Consequently, by comparing the criteria values across different models this study evaluates the degree to which each model deviates from achieving perfect predictive accuracy for the data (McElreath, 2020). Specifically, models with lower WAIC or PSIS exhibit less deviation from perfect predictive accuracy, compared to alternative models. Additionally, this evaluation provides an indication of the level of uncertainty associated with the findings.
Results
Bayesian inference results
In this section, the results of the Bayesian inference procedures are presented. Model 6 and 12 are selected as representative, as they have the largest number of parameters among all the models (refer to Section 7.6). The selected models are then utilized to illustrate the quality of the Bayesian estimates, in terms of stationarity, convergence, mixing, and the information available in the posterior distribution of the parameters. It is crucial to emphasize that the authors conducted a meticulous inspection of all the fitted models. All models achieved similar results to those selected for illustration.
The following code loads the model information. The function file_id() is a user-defined function that identifies, within a particular directory, the files generated by STAN as a result of the fitting process.
user-defined function: displays concise parameter estimate information for selected model
3
user-defined function: displays concise parameter estimate information for selected model
Stationarity, convergence and mixing
Figure 33 through Figure 40 show the trace, trace-rank, and ACF plots for selected parameters of the chosen models. First, the left panels of the figures display the trace plots. These plots illustrate that for each parameter, the chains’ post-warm-up iterations visually converge to a mean value with a constant variance. Second, the middle panel of the figures shows the trace-rank plots. These panels reveal that each parameter chains explored their respective parameter space in a seemingly random manner. Third, the right panel of the figures shows the ACF plots. These panels demonstrate that each parameter chains have relatively low autocorrelations.
Furthermore, Figure 41 shows the \hat{R} statistic for each parameter of the selected model. The figure reveals that the statistics supports the assessment of convergence in the chains’ post-warm-up iterations for each parameter, as no parameter \hat{R} exceeds the threshold of 1.05.
Figure 42 through Figure 46 show the density plots for selected parameters of the chosen models. The figures reveal that the parameters’ posterior distributions have enough information. This implies the posterior distributions have been generated with sufficient uncorrelated sampling points. Furthermore, Figure 47 shows the plots for effective sample size statistics (n_{\text{eff}}) for each parameter. The left and right panels of the figure demonstrate that most of the parameters have n_{\text{eff}} > 2000, with a few others having n_{\text{eff}} < 2000 post-warm-up iterations set by the Bayesian procedure (refer to Section 7.3). In aggregate, this provides supporting evidence that the parameters’ posterior distributions have sufficient information about the parameters, and they make substantive sense compared to the models’ prior beliefs.
On the other hand, the figures also reveal that all the posteriors are unimodal distributions with values centered around a mean. This provides additional evidence of stationarity and convergence of the distributions, as described in Section 8.1.1.
Code
pars =c('a','aHS[1]','aHS[2]','bAmHS[1]','bAmHS[2]')par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95,pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters
Figure 42: Model 06, density plots of selected parameters
Code
pars =c('m_b','m_i','m_u','s_b','s_i','s_u')par( mfrow=c(2,3) )dens_plot( stan_object=model12, p=0.95,pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model12, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters
Figure 43: Model 12, density plots of selected parameters
Code
pars =paste0('s_w[',1:6,']')par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95,pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters
Figure 44: Model 06, density plots of selected parameters
Code
pars =paste0('Mw[',1:6,']')par( mfrow=c(2,3) )dens_plot( stan_object=model12, p=0.95,pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model12, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters
Figure 45: Model 12, density plots of selected parameters
Code
pars =paste0('SI[',1:6,']')par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95,pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )
1
parameters of interest
2
used-defined function: generation of density plot with HPDI confidence intervals for selected parameters
Figure 46: Model 06, density plots of selected parameters
WAIC and PSIS indicate that beta model 7 is way better. Why, because the predictions now fall within the limits of the data (the other model shows a clear sign of underfitting )
Most variability is at the word-level. If this is not controlled you would be overestimating the precision of the tests. Because each observation doesn’t count as one effective piece of info, but less than that because they are correlated.
the unexplained variability of children and sentences follow in terms of the amount. This just describe how much variability is expected because of each thing.
the variability of block effects indicate that there might be some presence of lack of inter-listener reliability. If this was not controller this lack of reliability might creep in the hypothesis test of the parameters
Baker, F. (1998). An investigation of the item parameter recovery characteristics of a gibbs sampling procedure. Applied Psychological Measurement, 22(22), 153–169. https://doi.org/10.1177/01466216980222005
Baldwin, S., and Fellingham, G. (2013). Bayesian methods for the analysis of small sample multilevel data with a complex variance structure. Journal of Psychological Methods, 18(2), 151–164. https://doi.org/10.1037/a0030642
Boonen, N., Kloots, H., Nurzia, P., and Gillis, S. (2021). Spontaneous speech intelligibility: Early cochlear implanted children versus their normally hearing peers at seven years of age. Journal of Child Language, 1–26. https://doi.org/10.1017/S0305000921000714
Brooks, S., Gelman, A., Jones, G., and Meng, X. (2011). Handbook of markov chain monte carlo (1st ed.). Chapman; Hall, CRC. https://doi.org/10.1201/b10905
Chamberlain, T. (1965). The method of multiple working hypotheses. Science, 148(3671), 754–759. Retrieved from https://www.jstor.org/stable/1716334
Denwood, M. (2016). runjags: An R package providing interface utilities, model templates, parallel computing methods and additional distributions for MCMC models in JAGS. Journal of Statistical Software, 71(9), 1–25. https://doi.org/10.18637/jss.v071.i09
Depaoli, S. (2014). The impact of inaccurate “informative” priors for growth parameters in bayesian growth mixture modeling. Journal of Structural Equation Modeling, 21, 239–252. https://doi.org/10.1080/10705511.2014.882686
Depaoli, S., and Schoot, R. van de. (2017). Improving transparency and replication in bayesian statistics: The WAMBS-checklist. Psychological Methods, 22(2), 240–261. https://doi.org/10.1037/met0000065
Flipsen, P. (2006). Measuring the intelligibility of conversational speech in children. Clinical Linguistics & Phonetics, 20(4), 303–312. https://doi.org/10.1080/02699200400024863
Freeman, V., Pisoni, D., Kronenberger, W., and Castellanos, I. (2017). Speech intelligibility and psychosocial functioning in deaf children and teens with cochlear implants. Journal of Deaf Studies and Deaf Education, 22(3), 278–289. https://doi.org/10.1093/deafed/enx001
Gabry, J., and Češnovar, R. (2022). Cmdstanr: R interface to ’CmdStan’.
Gelman, A., Carlin, J., Stern, H., Dunson, D., Vehtari, A., and Rubin, D. (2014). Bayesian data analysis (3rd ed.). Chapman; Hall/CRC.
Heuven, V. van. (2008). Making sense of strange sounds: (Mutual) intelligibility of related language varieties. A review. International Journal of Humanities and Arts Computing, 2(1-2), 39–62. https://doi.org/10.3366/E1753854809000305
Hoffman, M., and Gelman, A. (2014). The no-u-turn sampler: Adaptively setting path lengths in hamiltonian monte carlo. Journal of Machine Learning Research, 15, 1593–1623. Retrieved from https://www.jmlr.org/papers/volume15/hoffman14a/hoffman14a.pdf
Kim, S., and Cohen, A. (1999). Accuracy of parameter estimation in gibbs sampling under the two-parameter logistic model. Annual Meeting of the American Educational Research Association. American Educational Research Association. Retrieved from https://eric.ed.gov/?id=ED430012
Kullback, S., and Leibler, R. (1951). On information and sufficiency. The Annals of Mathematical Statistics, 22(1), 79–86. Retrieved from http://www.jstor.org/stable/2236703
Lagerberg, T., Asberg, J., Hartelius, L., and Persson, C. (2014). Assessment of intelligibility using children’s spontaneous speech: Methodological aspects. International Journal of Language and Communication Disorders, 49(2), 228–239. https://doi.org/10.1111/1460-6984.12067
Lambert, P., Sutton, A., Burton, P., Abrams, K., and Jones, D. (2006). How vague is vague? A simulation study of the impact of the use of vague prior distributions in MCMC using WinBUGS. Journal of Statistics in Medicine, 24(15), 2401–2428. https://doi.org/10.1002/sim.2112
Lee, Y., and Nelder, J. A. (1996). Hierarchical generalized linear models. Journal of the Royal Statistical Society: Series B (Methodological), 58(4), 619–656. https://doi.org/10.1111/j.2517-6161.1996.tb02105.x
Martin, J., and McDonald, R. (1975). Bayesian estimation in unrestricted factor analysis: A treatment for heywood cases. Psychometrika, (40), 505–517. https://doi.org/10.1007/BF02291552
Plummer, M., Best, N., Cowles, K., and Vines, K. (2006). CODA: Convergence diagnosis and output analysis for MCMC. R News, 6(1), 7–11. Retrieved from https://journal.r-project.org/archive/
R Core Team. (2015). R: A language and environment for statistical computing. Vienna, Austria: R Foundation for Statistical Computing. Retrieved from http://www.R-project.org/
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004a). Generalized multilevel structural equation modeling. Psychometrika, 69(2), 167–190. https://doi.org/https://www.doi.org/10.1007/BF02295939
Rabe-Hesketh, S., Skrondal, A., and Pickles, A. (2004c). Maximum likelihood estimation of limited and discrete dependent variable models with nested random effects. Journal of Econometrics, 128(2), 301–323. https://doi.org/https://www.doi.org/10.1016/j.jeconom.2004.08.017
Seaman, S. jr., J., and Stamey, J. (2011). Hidden dangers of specifying noninformative priors. The American Statistician, 66(2), 77–84. https://doi.org/10.1080/00031305.2012.695938
Stan Development Team. (2020). RStan: The R interface to Stan. Retrieved from http://mc-stan.org/
Stan Development Team. (2021). Stan modeling language users guide and reference manual, version 2.26. Vienna, Austria. Retrieved from https://mc-stan.org
Vehtari, A., Gabry, J., Magnusson, M., Yao, Y., Bürkner, P., Paananen, T., and Gelman, A. (2023). Loo: Efficient leave-one-out cross-validation and WAIC for bayesian models. Retrieved from https://mc-stan.org/loo/
Vehtari, A., Gelman, A., and Gabry, J. (2017). Practical bayesian model evaluation using leave-one-out cross-validation and WAIC. Statistics and Computing, 27, 1413–1432. https://doi.org/10.1007/s11222-016-9696-4
Vehtari, A., Simpson, D., Gelman, A., Yao, Y., and Gabry, J. (2021). Pareto smoothed importance sampling. Retrieved from https://arxiv.org/abs/1507.02646
Watanabe, S. (2013). A widely applicable bayesian information criterion. Journal of Machine Learning Research 14, 14, 867–897. Retrieved from https://dl.acm.org/doi/10.5555/2567709.2502609
Whitehill, T., and Chau, C. (2004). Single-word intelligibility in speakers with repaired cleft palate. Clinical Linguistics and Phonetics, 18, 341–355. https://doi.org/10.1080/02699200410001663344
Wickham, H. (2007). Reshaping data with the reshape package. Journal of Statistical Software, 21(12), 1–20. Retrieved from http://www.jstatsoft.org/v21/i12/
Wickham, Hadley, Averick, M., Bryan, J., Chang, W., McGowan, L. D., François, R., … Yutani, H. (2019). Welcome to the tidyverse. Journal of Open Source Software, 4(43), 1686. https://doi.org/10.21105/joss.01686
Wickham, Hadley, François, R., Henry, L., Müller, K., and Vaughan, D. (2023). Dplyr: A grammar of data manipulation. Retrieved from https://CRAN.R-project.org/package=dplyr
Footnotes
For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke (2015) or McElreath (2020).↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on MCMC methods.↩︎
An interested reader can further refer to McElreath (2020) for a detailed explanation of grid approximation.↩︎
An interested reader can refer to McElreath (2020) for a detailed explanation of priors and hyperpriors.↩︎
An interested reader can refer to McElreath [-McElreath (2020); p. 420] for a detailed explanation of the challenges and reparametrizations required to set priors with hyperparameters.↩︎
An interested reader can refer to McElreath (2020), Gorinova et al. (2019) and Neal (2003)↩︎
An interested reader can refer to McElreath (2020) for a detailed explanation of prior predictive simulations.↩︎
The reader can refer to Brooks et al. (2011) for a detailed treatment on HMC methods.↩︎
Source Code
---title: 'Speech Intelligibility'subtitle: 'A Bayesian generalized linear, latent and mixed modeling approach'author: - name: Rivera Espejo, Jose (corresponding) email: JoseManuel.RiveraEspejo@uantwerpen.be url: https://www.uantwerpen.be/en/staff/jose-manuel-rivera-espejo_23166/ attributes: corresponding: true affiliations: - name: University of Antwerp department: Training and Education Sciences city: Antwerp country: Belgium postal-code: 2000 - name: De Maeyer, Sven email: sven.demaeyer@uantwerpen.be url: https://www.uantwerpen.be/en/staff/sven-demaeyer/ attributes: corresponding: false affiliations: - name: University of Antwerp department: Training and Education Sciences city: Antwerp country: Belgium postal-code: 2000 - name: Gillis, Steven email: steven.gillis@uantwerpen.be url: https://www.clips.uantwerpen.be/~gillis/index.html attributes: corresponding: false affiliations: - name: University of Antwerp department: Computational Linguistics, and Psycholinguistics Research Centre city: Antwerp country: Belgium postal-code: 2000date: "`r format(Sys.time(), '%d %B %Y')`"bibliography: bibliography.bib# geometry:# - left=1.0in# - textwidth=4.5in# - marginparsep=2in# - marginparwidth=2.25inexecute: cache: trueformat: html: cite-method: citeproc csl: apa-6th-edition-no-ampersand.csl keep-tex: true embed-resources: true link-external-icon: true link-external-newwindow: true theme: light: materia dark: darkly fontsize: 11pt smooth-scroll: true toc: true toc-depth: 3 toc-expand: 1 toc-title: 'Contents' toc-location: left number-sections: false # number-depth: 3 anchor-sections: true html-math-method: katex citations-hover: true footnotes-hover: true code-fold: true code-summary: "Code" code-overflow: scroll code-tools: true code-annotations: hover code-line-numbers: false code-copy: true tbl-cap-location: bottom---# AimThe purpose of this walk-through is to introduce and document the analyses developed for the study **_Speech Intelligibility: A Bayesian generalized linear, latent and mixed modeling approach_** (citation needed). The R packages utilized in the production of this document can be divided in three groups. First, the packages utilized to generate this document: `RColorBrewer`[@RColorBrewer_2022] and `quarto`[@Quarto_2022]. Second, the packages used for the handling the data: `stringr`[@stringr_2022], `dplyr`[@dplyr_2023], `tidyverse`[@tidyverse_2019], and `reshape2`[@reshape_2007]. Lastly, the packages used for the Bayesian implementation: `coda`[@coda_2006], `loo`[@loo_2023; @loo_2017], `cmdstanr`[@cmdstanr_2022], `rstan`[@RStan_2020], `runjags`[@runjags_2016], and `rethinking`[@rethinking_2021].::: {.column-margin}**_Packages_**:::# OrganizationIn this walk-through, @sec-interludes introduce various background topics that are relevant to the present study. These topics enable readers to progress smoothly through this research. Specifically, @sec-interlude1 provides a brief explanation of how Bayesian inference procedures work and their importance for this research. @sec-interlude2 is devoted to explaining the difference between two particular distributions, the normal and the beta-proportion distribution, and their role on modeling bounded data. @sec-interlude3 explain the (generalized) linear mixed models, elaborating on their role in modeling (non)normal clustered and bounded data. @sec-interlude4 illustrate the concept of measurement error and the role of latent variables to overcome the problems arising from it. Lastly, @sec-interlude5 explains the effects of the data distributional departures on the parameter estimates, and its importance for this research.The specific analysis for this study are elaborated from section @sec-researchG onwards. Particularly, @sec-researchG elaborates on the research gaps in the study of intelligibility. @sec-researchQ introduces the research questions that guide the study. @sec-data explores the data and its implications. @sec-methods thoroughly develop the methods to analyze the data. Lastly, @sec-results provides answers to the research question at hand.# Interludes {#sec-interludes}## Bayesian inference {#sec-interlude1}### Theory {#sec-bayesian_theory}Bayesian inference is an approach to statistical modeling and inference that is primarily based on the *Bayes' theorem*. The procedure aims to derive appropriate inference statements about a set of parameters by revising and updating their occurrence probabilities in light of new evidence [@Everitt_et_al_2010]. The procedure consist on defining the model assumptions in the form of a *likelihood* for the outcome and a set of *prior distributions* for the parameters of interest. This allows the derivation of a set of *posterior distributions* for the parameters, from which the statistical inferences are derived [^1]. As an example, a simple linear regression model with a parameter $\beta$ can be encoded under the Bayesian inference paradigm in the following form:[^1]: For a thorough explanation of the Bayesian inferences procedures the reader can refer to Kruschke [-@Kruschke_2015] or McElreath [-@McElreath_2020].::: {.column-margin}**_Bayesian inference_** Approach to statistical modeling and inference, that aims to derive appropriate inference statements about one or a set of parameters by revising and updating their probabilities in light of new evidence [@Everitt_et_al_2010].:::$$ \begin{align}P(\beta | Y, X ) &= \frac{ P( Y | \beta, X ) \cdot P( \beta ) }{ P( Y ) }\end{align}$$ {#eq-bayes1}where $P( Y| \beta, X )$ defines the *likelihood* of the outcome, which represents the probability distribution of the outcome $Y$, given the parameter $\beta$ and covariate $X$. Hence, the *likelihood* is the probability distribution assumption that a set of observations occur, conditioned on the value of a set of parameters [@Everitt_et_al_2010]. In other words, it describes the assumption about the underlying process that give rise to the data.::: {.column-margin}**_Likelihood_** Probability distribution assumption that a set of observations occur, given the value of a set of parameters [@Everitt_et_al_2010].:::$P( \beta )$ defines the *prior distribution* of the parameter $\beta$. A *prior* is a probability distribution that summarizes the available information about a parameter, prior to observing empirical data [@Everitt_et_al_2010]. In other words, it reflects the knowledge about a parameter prior to observing any data.::: {.column-margin}**_Prior distribution_** Probability distribution that summarize information about a random variable or parameter at a given time point, prior to observing empirical data [@Everitt_et_al_2010].:::$P( Y )$ defines the probability distribution of the data, which represents the *evidence* of the observed empirical data.As a result $P( \beta | Y, X )$, which denotes the *posterior distribution* of the parameter, describes the probability distribution of $\beta$ after observing empirical data. Specifically, it combines the *likelihood* of the outcome and the parameter's *prior distribution*, considering that empirical data have been observed.::: {.column-margin}**_Posterior distribution_** Probability distribution that summarize information about a random variable or parameter after having obtained new information from empirical data [@Everitt_et_al_2010].:::Before implementing the Bayesian inference procedures, two important concepts related to @eq-bayes1 need to be understood. First, the evidence of the empirical data $P(Y)$ serves as a normalizing constant. This is just another way of saying that the numerator in the equation is rescaled by a constant obtained from calculating $P(Y)$. Consequently, without loosing generalization, the equation can be succinctly rewritten in the following form:$$ \begin{align}P(\beta | Y, X ) &\propto P( Y | \beta, X ) \cdot P( \beta ) \\\end{align}$$ {#eq-bayes2}where $\propto$ denotes the proportional symbol. This implies that the posterior distribution of $\beta$ is proportional (up to a constant) to the multiplication of the outcome's likelihood and the parameter's prior distribution. This definition make the *calculation* of the posterior distribution easier, by separating the parameter's *updating process* from the integration of new empirical data (this will be clearly seen in the code provided in @sec-howitworks).Second, a dataset usually have multiple observations of the outcome $Y$ and covariate $X$, in the form of $y_{i}$ and $x_{i}$. Therefore, by law of probabilities and assuming independence among the observations, the likelihood of the full dataset can be rewritten as the product of all individual observation likelihoods. Consequently, @eq-bayes2 can also be rewritten as follows:$$ \begin{align}P(\beta | Y, X ) &\propto \prod_{i=1}^{n} P( y_{i} | \beta, x_{i} ) \cdot P( \beta ) \end{align}$$ {#eq-bayes3}### Estimation methods {#sec-estimation_methods}Several methods within the Bayesian inference procedures can be utilized to *estimate* the posterior distribution of the parameter, and most of these fall into the category of *Markov Chain Monte Carlo methods (MCMC)*. *MCMC* are methods to indirectly simulate random observations from probability distributions using stochastic processes [@Everitt_et_al_2010] [^2]. [^2]: The reader can refer to Brooks et al. [-@Gelman_et_al_2011] for a detailed treatment on MCMC methods.::: {.column-margin}**_Markov Chain Monte Carlo (MCMC)_** Methods to indirectly simulate random observations from probability distributions using stochastic processes [@Everitt_et_al_2010].:::However, when the parameters of interest are not large in number, a useful pedagogical method to produce the posterior distribution is the *grid approximation* method. Through this method, an excellent approximation of the parameter's posterior distribution can be achieved by considering a finite candidate list of parameter values. This method is used in @sec-howitworks to illustrate how the Bayesian inference works [^3]. [^3]: An interested reader can further refer to McElreath [-@McElreath_2020] for a detailed explanation of grid approximation. ::: {.column-margin}**_Grid approximation_** Method to indirectly simulate random observations from low dimensional continuous probability distributions, by considering a finite candidate list of parameter values [@McElreath_2020].:::### How does it work? {#sec-howitworks}A simple Bayesian linear regression model can be written in the following form:$$ \begin{align*}y_{i} &= \beta \cdot x_{i} + e_{i} \\e_{i} &\sim \text{Normal}( 0, 1 ) \\\beta &\sim \text{Uniform}( -20, +20 )\end{align*}$$where $y_{i}$ denotes the outcome's observation $i$, $\beta$ the expected effect of the observed covariate $x_{i}$ on the outcome, and $e_{i}$ the outcome's residual in observation $i$. Furthermore, the model assumes the residual $e_{i}$ is also normally distributed with mean zero and standard deviation equal to one. Lastly, prior to observe any data, it is assumed that $\beta$ is uniformly distributed within the range of $[-20,+20]$.However, a more convenient generalized manner to represent the same linear regression model is as follows:$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, +20 )\end{align*}$$In this definition, the component of the Bayesian inference procedure detailed in @sec-bayesian_theory are more easily spotted. First, about the likelihood, the outcome is assumed to be normally distributed with mean $\mu_{i}$ and standard deviation equal to one. Second, it is assumed $\beta$ has a prior that is a normal distribution with mean zero and standard deviation equal to one. Additionally, the equations reveal that the mean of the outcome $\mu_{i}$ is modeled by a linear predictor composed by the covariate $x_{i}$ and its effect on the outcome $\beta$.For illustration purposes, a simulated regression with $n=100$ observations was generated assuming $\beta=0.2$. @fig-regression_simulation shows the scatter plot of the generated data (see code below). The grid approximation method is used to generate random observations from the posterior distribution of $\beta$. Two noteworthy results emerge from the approach. Firstly, once the posterior distribution is generated, various summaries can be used to make inferences about the parameter of interest (refer to the code output below). Secondly, when considering a dataset with $n=100$ observations, the influence of the prior on the posterior distribution of $\beta$ is negligible. Specifically, prior to observe any data, assuming that $\beta$ could take any value within the range of $[-20,+20]$ with equal probability (left panel of @fig-bayesian_inference) did not have a substantial impact on the distribution of $\beta$ after empirical data was observed (right panel of @fig-bayesian_inference).```{r}#| label: code-regression_simulation#| fig-cap: ''#| echo: true#| warning: falseset.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>```1. replication seed2. simulation sample size3. covariate effect4. covariate simulation5. linear predictor on outcome mean6. outcome simulation```{r}#| label: code-bayesian_inference#| fig-cap: ''#| echo: true#| warning: false# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-bayesian_summary#| fig-cap: ''#| echo: true#| warning: falsepaste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-regression_simulation#| fig-cap: 'Outcome simulation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-3,3), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue' )abline( a=0, b=b_exp, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. simulation plot```{r}#| label: fig-bayesian_inference#| fig-cap: 'Bayesian inference: grid approximation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp), lty=2, col=c('gray','blue','red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### Priors and their effects {#sec-prior_effects}Prior to observing empirical data, assuming the parameter could take any value within within the range of $[-20,+20]$ with equal probability is not the only prior assumption that can be made. Different levels of uncertainty associated with a parameter can be encoded by different priors. This concept illustrated with @fig-prior_effects1 through @fig-prior_effects3, where three different types of priors are used to encode three levels of uncertainty about the parameter $\beta$.```{r}#| label: code-prior_effects#| fig-cap: ''#| echo: true#| warning: false# grid approximationNgp =1000# <1>post =data.frame( b_cand=seq( from=-20, to=20, length.out=Ngp ) ) # <2>ud_func =function(i){ post$b_cand[i]*x } # <3>mu_y =sapply( 1:length(post$b_cand), ud_func ) # <4>ud_func =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(post$b_cand), ud_func ) # <6>post$b_prior1 =rep( 1/40, length(post$b_cand) ) # <7>post$b_prior2 =dnorm( post$b_cand, mean=0, sd=0.5 ) # <8>post$b_prior3 =dnorm( post$b_cand, mean=0.2, sd=0.05 ) # <9>nam =c()for( i in1:3 ){ # <10> b_prop = y_lik * post[, paste0('b_prior',i) ] nam =c(nam, paste0('b_post',i) ) post =cbind(post, data.frame( b_prop /sum(b_prop) ) ) }names(post)[5:7] = nam```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. prior 1: uniform prior distribution (min=-20, max=+20)8. prior 2: normal prior distribution (mean=0, sd=0.5)9. prior 3: normal prior distribution (mean=0.2, sd=0.05)10. posterior distribution for each priorFirst, the distribution depicted in @fig-prior_effects1 assumes $\beta \sim \text{Uniform}(-20, +20)$ (similar to what is observed in @sec-howitworks). The distribution does not restrain the effect of $\beta$ to be more probable in any range within $[-20, +20]$. This type of distribution is commonly referred to as a *non-informative prior*. A *non-informative prior* distribution reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data [@Everitt_et_al_2010].::: {.column-margin}**_Non-informative priors_** Prior distribution that reflects the committal of a distribution to a wide range of values within a specific parameter space prior to observe any data [@Everitt_et_al_2010].:::```{r}#| label: fig-prior_effects1#| fig-cap: 'Bayesian inference: posterior distributions with non-informative prior distribution.'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior1')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post1')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotSecond, the distribution described in @fig-prior_effects2 assumes $\beta \sim \text{Normal}(0, 0.5)$. Consequently, the effect of $\beta$ is more probable within the range $[-1,+1]$, with less probability associated with parameter values outside this range. This is a an example of a *weakly-informative prior distribution*. *Weakly informative priors* are distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range [@McElreath_2020]. ::: {.column-margin}**_Weakly informative priors_** Prior distributions that exhibit no bias towards positive or negative effects, but constrain very weakly the effects of the parameter within a realistic range [@McElreath_2020].:::```{r}#| label: fig-prior_effects2#| fig-cap: 'Bayesian inference: posterior distributions with weakly-informative prior distribution.'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior2')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post2')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotThird, the distribution described in @fig-prior_effects3 assumes $\beta \sim \text{Normal}(0.2, 0.05)$. As a result, the effect of $\beta$ is more probable within the range $[0.1,0.3]$, with less probability associated with parameter values outside this range. This is an example of an *informative prior distribution*. *Informative priors* are distributions that expresses specific and definite information about a parameter [@McElreath_2020]. ::: {.column-margin}**_Informative priors_** Prior distributions that that expresses specific and definite information about a parameter [@McElreath_2020].:::```{r}#| label: fig-prior_effects3#| fig-cap: 'Bayesian inference: posterior distributions with informative prior distributions.'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( post[, c('b_cand','b_prior3')], type='l', # <1>xlim=c(-1.5,1.5), main='Prior distribution',xlab=expression(beta), ylab='probability' )abline( v=0, lty=2, col='gray' )plot( post[, c( 'b_cand','b_post3')], type='l', # <2>xlim=c(-1,1), main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b), lty=2, col=c('gray','blue') )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plotLastly, regarding the influence of different priors on the posterior distributions, @fig-prior_effects1 and @fig-prior_effects2 reveals that non-informative and weakly-informative priors have a negligible influence on the posterior distribution. Both priors result in similar posteriors. Furthermore, the figure shows the data sample size $n=100$ is still not enough to provide an unbiased and precise estimation of the true effect. In contrast, @fig-prior_effects3 shows that, informative priors can have a meaningful influence in the posterior distribution. In this particular case, the prior helps to estimate an unbiased and more precise effect. This results shows that when the data sample size is not sufficiently large, the prior assumptions can play a significant role on obtaining appropriate parameter estimates.### What are Hyperpriors? {#sec-hyperpriors}It is crucial to consider that for a greater modeling flexibility and a more refined representation of the parameters' uncertainty, sometimes it is convenient to define a prior distribution in terms of *hyperparameters* and *hyperpriors* [^4]. *Hyperparameters* refer to parameters that index a family of possible prior distributions for another parameter. On the other hand, *hyperpriors* are prior distributions for these hyperparameters [@Everitt_et_al_2010].[^4]: An interested reader can refer to McElreath [-@McElreath_2020] for a detailed explanation of priors and hyperpriors. ::: {.column-margin}**_Hyperparameters_** Parameters $\theta_{2}$ that indexes a family of possible prior distributions for another parameter $\theta_{1}$ [@Everitt_et_al_2010].:::::: {.column-margin}**_Hyperpriors_** Prior distributions for hyperparameters [@Everitt_et_al_2010].:::A simple example of the use of hyperpriors would be to define the regression model shown in @sec-howitworks in the following form:$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Normal}( 0, \text{exp}(v) ) \\v &\sim \text{Normal}(0, 3)\end{align*}$$where $v$ define the *hyperparameter* for the parameter $\beta$, and its associated distribution define its *hyperprior*. However, setting prior distributions through hyperparameters brings its own challenges [^7]. One notable challenge pertains to the geometry of the parameter's sample space. This implies that specific prior probabilistic representations, defined by the same hyperparameters, exhibit simpler sample geometries compared to others [^8]. The re-parametrization of priors into such simpler sample geometries leads to the notion of *non-centered priors*. In this approach, a parameter's prior distribution is expressed in terms of a hyperparameter, which is defined by a transformation of the original parameter of interest [@Gorinova_et_al_2019]. By incorporating *non-centered priors*, researchers can ensure the reliability of certain posterior distributions within Bayesian inference procedures. To illustrate, a straightforward example of a non-centered reparametrization of a prior can be demonstrated as follows:::: {.column-margin}**_Non-centered priors_** Expression of a parameter's distribution in terms of an hyperparameter defined by a transformation of the original parameter of interest [@Gorinova_et_al_2019].:::[^7]: An interested reader can refer to McElreath [-@McElreath_2020; p. 420] for a detailed explanation of the challenges and reparametrizations required to set priors with hyperparameters.[^8]: An interested reader can refer to McElreath [-@McElreath_2020], Gorinova et al. [-@Gorinova_et_al_2019] and Neal [-@Neal_2003]$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &= z \cdot \text{exp}(v) \\v &\sim \text{Normal}(0, 3) \\z &\sim \text{Normal}( 0, 1 )\end{align*}$$where $z$ is a hyperparameter sampled independently from $v$, and the parameter of interest $\beta$ is obtained as a transformation of the two hyperparameters. @fig-reparametrization illustrates the differences in sampling geometries between a centered and a non-centered parametrization. It is evident that the sampling geometry depicted in the left panel of the figure is narrower than the one depicted in the right panel, and as a result, Bayesian inference procedures have an harder time sampling from the former than the latter distributions.```{r}#| label: code-reparametrization#| fig-cap: ''#| echo: true#| warning: falsen =5000# <1>v =rnorm( n=n, mean=0, sd=1 ) # <2>z =rnorm( n=n, mean=0, sd=1 )b_cent =rnorm( n=n, mean=0, sd=exp(v) ) # <3>b_non = z*exp(v) # <4>```1. simulation sample size2. hyperparameter simulation3. centered parametrization simulation4. non-centered parameterization simulation```{r}#| label: fig-reparametrization#| fig-cap: 'Centered and non-centered parameter spaces'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( b_cent, v, pch=19, col=rgb(0,0,0,alpha=0.1), # <1>xlab=expression(beta), ylab=expression(v[b]),main='Centered parametrization' ) plot( z, v, pch=19, col=rgb(0,0,0,alpha=0.1), # <2>xlab=expression(z[b]), ylab=expression(v[b]),main='Non-centered parametrization' )par( mfrow=c(1,1) )```1. plot of centered parametrization2. plot of non-centered parametrization### Importance {#sec-whybayesian}Three characteristics of Bayesian inference makes it relevant to the present study. Firstly, they have shown superior performance in dealing with *complex and highly-parameterized* models [@Baker_1998; @Kim_1999]. Secondly, they are more suitable for *small* sample sizes [@Baldwin_et_al_2013; @Lambert_et_al_2005; @Depaoli_2014]. Lastly, they have an ability to incorporate *prior information* to constraint parameters within a permissible space (e.g., positive variances), thus preventing common estimation problems associated with classical methods [@Martin_et_al_1975; @Seaman_et_al_2011]. ::: {.column-margin}**_Benefits of Bayesian inference procedures_** More suitable to deal with: 1. Complex or highly-parameterized model2. Small sample sizes3. Parameter's constraints.:::## A tale of two distributions {#sec-interlude2}### The normal distribution {#sec-normal_dist}A normal distribution is a type of continuous probability distribution in which a random variable can take on values along the real line $\left( y_{i} \in [-\infty, \infty] \right)$. The distribution is characterized by two independent parameters: the mean $\mu$ and the standard deviation $\sigma$ [@Everitt_et_al_2010]. Thus, a random variable can take on values that are gathered around a mean $\mu$, with some values dispersed based on some amount of deviation $\sigma$, without any restriction. Importantly, by definition of the normal distribution, the *location* (mean) of the distribution does not influence its *spread* (deviation). @fig-normal_dist illustrates how the distribution of an outcome changes with different values of $\mu$ and $\sigma$. The left panel demonstrate that the distribution of the outcome can shift in terms of its location based on the value of $\mu$. The right panel shows how the distribution of the outcome can become narrower or wider based on the values of $\sigma$. It is noteworthy that alterations in the mean $\mu$ of the distribution have no impact on its standard deviation $\sigma$.```{r}#| label: fig-normal_dist#| fig-cap: 'Normal distribution with different mean and standard deviations'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>mu =c(-1.5, 0, 1.5) # <2>sigma =c(1.5, 1, 0.5) par(mfrow=c(1,2))cp =sapply( 1:length(mu), col.alpha, alpha=0.7) for(i in1:length(mu)){if(i==1){curve( dnorm(x, mean=mu[i], sd=1), # <3>from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=mu, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( mu[1]==-1.5, mu[2]==0, mu[3]==+1.5, sigma==1) ) } else{curve( dnorm(x, mean=mu[i], sd=1),from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}cp =sapply( 1:length(sigma), col.alpha, alpha=0.7)for(i in1:length(sigma)){if(i==1){curve( dnorm(x, mean=0, sd=sigma[i]), # <4> from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i],xlab="outcome values", ylab="density")abline(v=0, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( sigma[1]==1.5, sigma[2]==1, sigma[3]==0.5, mu==0) ) } else{curve( dnorm(x, mean=0, sd=sigma[i]), from=-3, to=3, ylim=c(0,1.5), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}par(mfrow=c(1,1))```1. required package2. parameter to plot: means and standard deviations3. plotting normal distribution with different 'mu' and 'sigma=1'4. plotting normal distribution with 'mu=0' and different sigma's### The beta-proportion distribution {#sec-beta_dist}A beta-proportion distribution is a type of continuous probability distribution in which a random variable can assume values within the continuous interval between zero and one $\left( y_{i} \in [0, 1] \right)$. The distribution is characterized by two parameters: the mean $\mu$ and the *sample size* $M$ [@Everitt_et_al_2010]. This implies that a random variable can take on values restricted within the unit interval, centered around a mean $\mu$, with some values being more dispersed based on the *sample size* $M$. Additionally, two characteristic define the distribution. Firstly, like the random variable, the mean of the distribution can only take values within the unit interval ($\mu \in [0,1]$). Secondly, the mean and sample size parameters are no longer independent of each other.@fig-betaprop_dist illustrates how an outcome with a beta-proportion distribution changes with different values of $\mu$ and $M$. The figure reveals two prevalent patterns in the distribution: (1) the behavior of the dispersion, as measured by the sample size, depends on the mean of the distribution, and (2) the larger the sample size, the less dispersed the distribution is within the unit interval.```{r}#| label: fig-betaprop_dist#| fig-cap: 'Beta-proportion distribution with different mean and sample sizes'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>mu =c(0.2, 0.5, 0.8) # <2>M =c(2, 5, 20) par(mfrow=c(1,2))cp =sapply( 1:length(mu), col.alpha, alpha=0.7) for(i in1:length(mu)){if(i==1){curve( dbeta2(x, prob=mu[i], theta=10), # <3>from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=mu, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( mu[1]==0.2, mu[2]==0.5, mu[3]==0.8, M==10) ) } else{curve( dbeta2(x, prob=mu[i], theta=10),from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}cp =sapply( 1:length(M), col.alpha, alpha=0.7)for(i in1:length(M)){if(i==1){curve( dbeta2(x, prob=0.3, theta=M[i]), # <4>from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="outcome values", ylab="density")abline(v=0.3, col='gray', lty=2)legend('topleft', col=c(cp,'gray'), lwd=2, bty='n',legend=expression( M[1]==2, M[2]==5, M[3]==20, mu==0.3) ) } else{curve( dbeta2(x, prob=0.3, theta=M[i]),from=0, to=1, ylim=c(0,8), lwd=2, col=cp[i], xlab="", ylab="", add=T ) }}par(mfrow=c(1,1))```1. required package2. parameter to plot: means and 'sample size'3. plotting beta-proportion distribution with different 'mu' and 'M=10'4. plotting beta-proportion distribution with 'mu=0.5' and different M's### Importance {#sec-whybeta}The assumption of normally distributed outcomes is ubiquitous in speech intelligibility research [see @Boonen_et_al_2021; @Flipsen_2006; @Lagerberg_et_al_2014]. Therefore, it is crucial to comprehend what signifies for an outcome to follow a normal distribution. In contrast, the significance of the beta-proportion distribution lies in providing a suitable alternative for modeling non-normally *bounded* distributed outcomes, such as the entropy scores utilized in this study. *Boundedness* refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur [@Lebl_2022]. Neglecting the bounded nature of an outcome can lead to *underfitting*. *Underfitting* occurs when the statistical model fails to capture the underlying patterns or complexity of the data. In such scenario, a model may generate predictions outside the data range, which can be physically inconsistent or highly unlikely [@Everitt_et_al_2010], resulting in an inability to generalize its results when confronted with new data. ::: {.column-margin}**_Boundedness_**Refers to the restriction of data values within specific bounds or intervals, beyond which they cannot occur [@Lebl_2022]:::::: {.column-margin}**_Underfitting_**When the statistical model fails to capture the underlying patterns or complexity of the data [@Everitt_et_al_2010].:::## Linear Mixed Models {#sec-interlude3}### The ordinary LMM {#sec-LMM}An *ordinary linear mixed model (LMM)* is a procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates [@Holmes_et_al_2019]. A commonly know Bayesian probabilistic representation of an ordinary LMM can be expressed as follows:::: {.column-margin}**_Ordinary linear mixed model (LMM)_**Procedure employed to estimate a linear relationship between the mean of a normally distributed outcome with clustered observations, and one or more covariates [@Holmes_et_al_2019].:::$$\begin{align*}y_{ib} &= \beta x_{i} + a_{b} + \varepsilon_{ib} \\\varepsilon_{ib} &\sim \text{Normal}(0, 1) \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \end{align*}$$where $y_{ib}$ denotes the outcome's $i$'th observation clustered in block $b$, and $x_{i}$ denotes the covariate for observation $i$. Moreover, $\beta$ denote the fixed slope of the regression. Furthermore, $a_{b}$ denotes the random effects, and $\varepsilon_{ib}$ defines the random outcome residuals. Furthermore, the residuals $\varepsilon_{ib}$ are assumed to be normally distributed with mean zero and standard deviation equal to one. Additionally, prior to observing any data, $\beta$ is assumed to be normally distributed with mean zero and standard deviation equal to $0.5$. Similarly, $a_{b}$ is assumed to be normally distributed with mean zero and standard deviation equal to one.### The generalized LMM {#sec-GLMM}A *generalized linear mixed model (GLMM)* are a set of models used to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates [@Nelder_et_al_1996]. Interestingly, the ordinary Bayesian LMM detailed in the previous section can be represented as a special case of GLMM, as follows:::: {.column-margin}**_Generalized linear mixed model (GLMM)_**Procedure employed to estimate (non)linear relationship between the mean of a (non)normally distributed outcome with clustered observations, and one or more covariates [@Nelder_et_al_1996].:::$$\begin{align*}y_{ib} &\sim \text{Normal}( \mu_{ib}, 1) \\\mu_{ib} &= \beta x_{i} + a_{b} \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \\\end{align*}$$Notice this representation explicitly highlights the four components of a Bayesian GLMM: the likelihood component, the linear predictor, the link function and the priors [@McElreath_2020]. The likelihood component specifies the assumption about the distribution of an outcome, in this case a normal distribution with mean $\mu_{ib}$ and standard deviation equal to one. The linear predictor specifies the manner in which the covariate will predict the mean of the outcome. In this case the linear predictor is a linear combination of the parameter $\beta$, the covariate $x_{i}$, and the random effects $a_{b}$. The link function specifies the relationship between the mean of the outcome $\mu_{ib}$ and the linear predictor. In this case no transformation is applied to the linear predictor to match its range with the range of the outcome, as both can take on values within the real line (refer to @sec-normal_dist). Lastly, the priors describe what is known about the parameters $\beta$ and $a_{b}$ before observing any empirical data.::: {.column-margin}**_GLMM components_**1. Likelihood component2. Linear predictor3. Link function4. Prior distributions:::On the other hand, a beta-proportion LMM is also a GLMM, and it can be represented probabilistically as follows:$$\begin{align*}y_{ib} &\sim \text{BetaProp}( \mu_{ib}, 10 ) \\\mu_{ib} &= \text{logit}^{-1}( \beta x_{i} + a_{b} ) \\\beta &\sim \text{Normal}(0, 0.5) \\a_{b} &\sim \text{Normal}(0, 1) \\\end{align*}$$Notice the representation also highlights the four components of a Bayesian GLMM; however, their assumptions are now slightly different. The likelihood component assumes a beta-proportion distribution for the outcome with mean $\mu_{ib}$ and sample size equal to $10$. The linear predictor is still a linear combination of the parameter $\beta$, the covariate $x_{i}$, and the random intercepts $a_{b}$. However, the link function now assumes the mean of the outcome is (non)linearly related to the linear predictor by a inverse-logit function: $\text{logit}^{-1}(x) = exp(x) / (1+exp(x))$. The inverse-logit function allows the linear predictor to match the range observed in the mean of the beta-proportion distribution $\mu_{ib} \in [0,1]$ (refer to @sec-beta_dist). Lastly, the prior assumptions for $\beta$ and $a_{b}$ are also declared.### Importance {#sec-whyGLMM}Understanding LMM is essential due to the ubiquitous assumption of normally distributed outcomes within the speech intelligibility research field [see @Boonen_et_al_2021; @Flipsen_2006; @Lagerberg_et_al_2014]. Furthermore, their significance also lies in their ability to model *clustered* outcomes. *Clustering* occurs when multiple observations arise from the same individual, location, or time [@McElreath_2020]. Accounting for data clustering is essential, as disregarding it may result in biased and inefficient parameter estimates. Consequently, such biases and inefficiencies can diminish *statistical power* or increase the likelihood of committing a *type I error*. *Statistical power* defines the model's ability to reject the null hypothesis when it is false [@Everitt_et_al_2010]. *Type I error* occurs when a null hypothesis is erroneously rejected [@Everitt_et_al_2010].::: {.column-margin}**_Clustering_**Occurs when multiple observations arise from the same individual, location, or time [@McElreath_2020].:::::: {.column-margin}**_Statistical power_** The model's ability to reject the null hypothesis when it is false [@Everitt_et_al_2010].:::::: {.column-margin}**_Type I error_**The error that results when a null hypothesis is erroneously rejected [@Everitt_et_al_2010].:::Moreover, the significance of GLMM lies in offering the same benefits as the LMMs, in terms of parameter unbiasedness and efficiency. However, the framework also allows for the modeling of (non)linear relationships of (non)normally distributed outcomes. This is particularly important for modeling bounded data, such as the entropy scores utilized in this study. Refer to @sec-whybeta to understand the importance of considering the bounded nature of the data in the modeling process.## Measurement error in an outcome {#sec-interlude4}### What is the problem?The problem of measurement error in an outcome is easier to understand with a motivating example. Using a similar model as the one depicted in @sec-howitworks, the probabilistic representation of measurement error in the outcome can be depicted as follows:$$ \begin{align*}\tilde{y}_{i} &\sim \text{Normal}( y_{i}, s ) \\y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 )\end{align*}$$This representation effectively means that a *manifest* outcome $\tilde{y}_{i}$ is assumed to be normally distributed with a mean equal to the *latent* outcome $y_{i}$ and a measurement error $s$. The latent outcome $y_{i}$ is also assumed to be normally distributed but with a mean $\mu_{i}$ and a standard deviation of one. The mean of the latent outcome is considered to be explained by a linear combination of the covariate $x_{i}$ and its expected effect $\beta$. Lastly, prior to observing any data, $\beta$ is assumed to follow a uniform distribution within the range of $[-20, +20]$, representing a non-informative prior.For illustrative purposes, a simulated outcome with $n=100$ observations was generated, assuming $\beta=0.2$, and a measurement error of $s=2$. @fig-measurement_simulation shows the scatter plot of the generated data (see code below). The left panel of the figure demonstrates that the *manifest* outcome has a larger spread than the *latent* outcome depicted in the right panel. As a result, although $\beta$ is expected to be estimated in an unbiased manner, the statistical hypothesis tests for the parameter will likely be affected due to this larger variability.The estimation output confirms the previous hypothesis. The posterior distribution of $\beta$, estimated using the manifest outcome, has a larger standard deviation than the one estimated using the appropriate latent outcome (see @fig-measurement_inference and code output below). Furthermore, the code output shows the parameter's posterior distribution can no longer reject the null hypothesis at confidence levels of $90\%$ and $95\%$, indicating a reduced statistical power.```{r}#| label: code-measurement_simulation#| fig-cap: ''#| echo: true#| warning: falseset.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>s =2# <7>y_tilde =rnorm( n=n, mean=y, sd=s ) # <8>```1. replication seed2. simulation sample size3. covariate effect4. covariate simulation5. linear predictor on outcome mean6. latent outcome simulation7. measurement error8. manifest outcome simulation```{r}#| label: code-measurement_inference#| fig-cap: ''#| echo: true#| warning: false# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y_tilde, mean=mu_y[,i], sd=s ) ) } # <5>y_lik_man =sapply( 1:length(b_cand), udf ) # <6>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <7>y_lik_lat =sapply( 1:length(b_cand), udf ) # <8>b_prior =rep( 1/40, length(b_cand) ) # <9>b_prop_man = y_lik_man * b_prior # <10>b_post_man = b_prop_man /sum(b_prop_man) # <11>b_prop_lat = y_lik_lat * b_prior # <12>b_post_lat = b_prop_lat /sum(b_prop_lat) # <13>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods for manifest outcome6. manifest outcome data likelihood7. user defined function: product of individual observation likelihoods for latent outcome8. latent outcome data likelihood9. uniform prior distribution for parameter, on manifest and latent outcomes10. proportional posterior distribution for parameter on manifest outcome11. posterior distribution for parameter on manifest outcome12. proportional posterior distribution for parameter on latent outcome13. posterior distribution for parameter on latent outcome```{r}#| label: code-measurement_summary#| fig-cap: ''#| echo: true#| warning: falsepaste0( 'true beta = ', b ) # <1># manifest outcomeb_exp_man =sum( b_cand * b_post_man ) # <2>paste0( 'estimated beta (expectation on manifest) = ', round(b_exp_man, 3) )b_var_man =sqrt( sum( ( (b_cand-b_exp_man)^2 ) * b_post_man ) ) # <3>paste0( 'estimated beta (standard deviation on manifest) = ', round(b_var_man, 3) )# latent outcomeb_exp_lat =sum( b_cand * b_post_lat ) # <4>paste0( 'estimated beta (expectation on latent) = ', round(b_exp_lat, 3) )b_var_lat =sqrt( sum( ( (b_cand-b_exp_lat)^2 ) * b_post_lat ) ) # <5>paste0( 'estimated beta (standard deviation on latent) = ', round(b_var_lat, 3) )# null hypothsis rejectionb_prob_man =sum( b_post_man[ b_cand >0 ] ) # <6>paste0( 'P(estimated beta on manifest > 0) = ', round(b_prob_man, 3) )b_prob_lat =sum( b_post_lat[ b_cand >0 ] ) # <7>paste0( 'P(estimated beta on latent > 0) = ', round(b_prob_lat, 3) )```1. true values for the parameter2. expected value for the parameter on manifest outcome3. standard deviation for the parameter on manifest outcome4. expected value for the parameter on latent outcome5. standard deviation for the parameter on latent outcome6. probability that the parameter is greater than zero, on manifest outcome7. probability that the parameter is greater than zero, on latent outcome```{r}#| label: fig-measurement_simulation#| fig-cap: 'Measurement error simulation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( x, y_tilde, xlim=c(-3,3), ylim=c(-7,7), # <1>pch=19, col=rgb(0,0,0,alpha=0.3),ylab=expression(tilde(y)),main='manifest outcome' )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp_man, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )plot( x, y, xlim=c(-3,3), ylim=c(-7,7), # <2>pch=19, col=rgb(0,0,0,alpha=0.3),main='latent outcome' )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp_lat, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par( mfrow=c(1,1) )```1. simulation plot of manifest outcome2. simulation plot of latent outcome```{r}#| label: fig-measurement_inference#| fig-cap: 'Bayesian inference: grid approximation on measurement error outcomes'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_post_man, type='l', xlim=c(-0.5,1), # <1>main='Posterior on manifest outcome',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp_man), lty=2, col=c('gray', 'blue', 'red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )plot( b_cand, b_post_lat, type='l', xlim=c(-0.5,1), # <2>main='Posterior on latent outcome',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp_lat), lty=2, col=c('gray', 'blue','red') )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### How to solve it?*Latent variables* can be used to address the problem arising from the larger observed variability in one or more manifest outcomes. A *latent variable* is a variable that cannot be directly measured but is assumed to be primarily responsible for the variability in one or more manifest variables [@Everitt_et_al_2010]. Latent variables can be interpreted as hypothetical constructs, traits, or *true* variables that account for the variability that induce dependence in one or more manifest variables [@Rabe_et_al_2004a]. This concept is akin to a linear mixed model, where the random effects serve to account for the variability that induces dependence within clustered outcomes [@Rabe_et_al_2004a] (refer to @sec-interlude3). The most widely known examples of latent variable models include Confirmatory Factor Analysis and Structural Equation Models (CFA and SEM, respectively).::: {.column-margin}**_Latent variables_** Variables that cannot be measured directly but are assumed to be the principal responsible for the common variability in one or more manifest variables [@Everitt_et_al_2010].:::Commonly, latent variable models consist of two parts: a measurement part and a structural part. In the measurement part, the principles of the Thurstonian model [@Thurstone_1927; @Luce_1959] are employed to aggregate one or more manifest variables and estimate a latent variable. In the structural part, regression-like relationships among latent and other manifest variables are specified, allowing researchers to test hypotheses about their (causal) relationships [@Hoyle_et_al_2014]. While the measurement part is sometimes of interest in its own right, the substantive model of interest is often defined by the structural part [@Rabe_et_al_2004a].### Importance {#sec-whylatent}It becomes evident that when an outcome is measured with error, the estimation procedures based on standard assumptions yield inefficient parameter estimates. This implies that the parameters are not estimated with sufficient precision. Consequently, such inefficiency can reduce statistical power and increase the likelihood of committing a *type II error*, which occurs when a null hypothesis is erroneously accepted [@Everitt_et_al_2010]. ::: {.column-margin}**_Type II error_**The error that results when a null hypothesis is erroneously accepted [@Everitt_et_al_2010].:::Therefore, the issue of measurement error in an outcome is highly relevant to this study. This research assumes that a speaker's (latent) potential intelligibility contributes, in part, to the observed variability in the speaker's (manifest) entropy scores. Given the interest in testing hypotheses about the potential intelligibility of speakers, and considering that the entropy scores are subject to measurement error, it becomes necessary to use latent variables to generate precise parameter estimates to test the hypothesis of interest.## Distributional departures {#sec-interlude5}### Heteroscedasticity {#sec-heteroscedasticity}In the context of regression analysis, *heteroscedasticity* occurs when the variance of an outcome depends on the values of another variable. The opposite case is called *homoscedasticity*. An example of heteroscedasticity can be probabilistically represented as follows:::: {.column-margin}**_Heteroscedasticity_**Occurs when the variance (standard deviation) of an outcome depends on the values of another variable. The opposite case is called *homoscedasticity* [@Everitt_et_al_2010].:::$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, \sigma_{i} ) \\\mu_{i} &= \beta \cdot x_{i} \\\sigma_{i} &= exp( \gamma \cdot x_{i} ) \\\beta &\sim \text{Uniform}( -20, 20 ) \\\gamma &\sim \text{Uniform}( -20, 20 )\end{align*}$$This representation implies that an outcome $y_{i}$ is assumed normally distributed with mean $\mu_{i}$ and a standard deviation $\sigma_{i}$. Furthermore, the mean and standard deviation of the outcome is explained by the covariate $x_{i}$, through the parameters $\beta$ and $\gamma$. Lastly, prior to observing any data, $\beta$ and $\gamma$ are assumed to be uniformly distributed in the range of $[-20,+20]$.@fig-heteroscedasticity illustrate the presence of heteroscedasticity using the previous representation, assuming a sample size of $n=100$, and parameters $\beta=0.2$ and $\gamma=1$. Notice the variability of the outcome increases as the covariate also increases. Consequently, it is easy to intuit that this difference in the outcome's variability could have and impact on the statistical hypothesis tests of $\beta$, and even in the estimate itself. To prove the intuition, an incorrect model is used to estimate $\beta$.$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 ) \\\end{align*}$$As a result, the hypotheses are proven accurate. When an outcome is erroneously assumed *homoscedastic*, the parameter estimates not only become inefficient but also are not estimated closer to the *true* value, as seen in the output code below and in @fig-heteroscedasticity_inference.```{r}#| label: code-heteroscedasticity#| fig-cap: ''#| echo: true#| warning: falseset.seed(12345) # <1>n =100# <2>b =0.2# <3>g =1x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>s_y =exp(g*x) y =rnorm( n=n, mean=mu_y, sd=s_y ) # <6>```1. replication seed2. simulation sample size3. beta and gamma effects4. covariate simulation5. (non)linear predictor on outcome mean and standard deviation6. outcome simulation```{r}#| label: code-heteroscedasticity_inference#| fig-cap: ''#| echo: true#| warning: false# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-heteroscedasticity_summary#| fig-cap: ''#| echo: true#| warning: falsepaste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-heteroscedasticity#| fig-cap: 'Heteroscedasticity simulation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-6,6), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp, lty=2, col='red' )abline( a=-4, b=-1, lty=2)abline( a=4.4, b=1.5, lty=2)legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. scatter plot of an heteroscedastic outcome```{r}#| label: fig-heteroscedasticity_inference#| fig-cap: 'Bayesian inference: grid approximation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp, b_max), lty=2, col=c('gray','blue', rainbow(2)) )legend( 'topleft', legend=c('true', 'expected', 'max prob.'),bty='n', col=c('blue',rainbow(2)), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### Outliers {#sec-outliers}In regression analysis, *outliers* are defined as observations that appear to deviate markedly from other sample data points in which they occur [@Everitt_et_al_2010]. Although no unique probabilistic representation of *outliers* can be represented, a simple example can be illustrated with @fig-outliers. The figure depicts the presence of three influential observations in the outcome (colored blue). It is easier to intuit that with the presence of influential observations the parameter estimates, and the hypothesis test resulting from them, can be affected.::: {.column-margin}**_Outlier_**Observation that appear to deviate markedly from other sample data points in which it occurs [@Everitt_et_al_2010].:::The intuition is proven correct when $\beta$ is estimated using the same *incorrect* model used in @sec-heteroscedasticity. When an outcome is erroneously assumed without outliers, the parameter value is estimated farther from the truth, as observed in the code output below and in @fig-outliers_inference.$$ \begin{align*}y_{i} &\sim \text{Normal}( \mu_{i}, 1 ) \\\mu_{i} &= \beta \cdot x_{i} \\\beta &\sim \text{Uniform}( -20, 20 ) \\\end{align*}$$```{r}#| label: code-outliers#| fig-cap: ''#| echo: true#| warning: falseset.seed(12345) # <1>n =100# <2>b =0.2# <3>x =rnorm( n=n, mean=0, sd=1 ) # <4>mu_y = b*x # <5>y =rnorm( n=n, mean=mu_y, sd=1 ) # <6>idx =which( x>1 ) # <7>sel =1:3y[idx[sel]] =6```1. replication seed2. simulation sample size3. beta effects4. covariate simulation5. linear predictor on outcome mean6. outcome simulation7. outlier simulation```{r}#| label: code-outliers_inference#| fig-cap: ''#| echo: true#| warning: false# grid approximationNgp =1000# <1>b_cand =seq( from=-20, to=20, length.out=Ngp ) # <2>udf =function(i){ b_cand[i]*x } # <3>mu_y =sapply( 1:length(b_cand), udf ) # <4>udf =function(i){ prod( dnorm( y, mean=mu_y[,i], sd=1 ) ) } # <5>y_lik =sapply( 1:length(b_cand), udf ) # <6>b_prior =rep( 1/40, length(b_cand) ) # <7>b_prop = y_lik * b_prior # <8>b_post = b_prop /sum(b_prop) # <9>```1. number of points in candidate list 2. candidate list for parameter3. user defined function: linear predictor for each candidate4. calculation of the linear predictor for each candidate5. user defined function: product of individual observation likelihoods6. outcome data likelihood7. uniform prior distribution for parameter (min=-20, max=20)8. proportional posterior distribution for parameter9. posterior distribution for parameter```{r}#| label: code-outliers_summary#| fig-cap: ''#| echo: true#| warning: falsepaste0( 'true beta = ', b ) # <1>b_exp =sum( b_cand * b_post ) # <2>paste0( 'estimated beta (expectation) = ', round(b_exp, 3) )b_max = b_cand[ b_post==max(b_post) ] # <3>paste0( 'estimated beta (maximum probability) = ', round(b_max, 3) )b_var =sqrt( sum( ( (b_cand-b_exp)^2 ) * b_post ) ) # <4>paste0( 'estimated beta (standard deviation) = ', round(b_var, 3) )b_prob =sum( b_post[ b_cand >0 ] ) # <5>paste0( 'P(estimated beta > 0) = ', round(b_prob, 3) )```1. true values for the parameter2. expected value for the parameter3. maximum probability value for the parameter4. standard deviation for the parameter5. probability that the parameter is greater than zero```{r}#| label: fig-outliers#| fig-cap: 'Outliers simulation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 5plot( x, y, xlim=c(-3,3), ylim=c(-6,6), # <1>pch=19, col=rgb(0,0,0,alpha=0.3) )points( x[idx[sel]], y[idx[sel]], # <1>pch=19, col=rgb(0,0,1,alpha=0.3) )abline( a=0, b=b, lty=2, col='blue')abline( a=0, b=b_exp, lty=2, col='red' )legend( 'topleft', legend=c('true', 'expected'),bty='n', col=c('blue','red'), lty=rep(2,3) )```1. scatter plot of an outcome with outliers```{r}#| label: fig-outliers_inference#| fig-cap: 'Bayesian inference: grid approximation'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par(mfrow=c(1,2))plot( b_cand, b_prior, type='l', xlim=c(-1.5,1.5), # <1>main='Prior distribution',xlab=expression(beta), ylab='probability' ) abline( v=0, lty=2, col='gray' )plot( b_cand, b_post, type='l', xlim=c(-1,1), # <2>main='Posterior distribution',xlab=expression(beta), ylab='probability' ) abline( v=c(0, b, b_exp, b_max), lty=2, col=c('gray','blue', rainbow(2)) )legend( 'topleft', legend=c('true', 'expected', 'max prob.'),bty='n', col=c('blue',rainbow(2)), lty=rep(2,3) )par(mfrow=c(1,1))```1. prior distribution density plot2. posterior distribution density plot### SolutionAs recommended by McElreath [-@McElreath_2020], *robust regression models* can be used to deal with these types of distributional departures. *Robust regression models* are a general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model [@Everitt_et_al_2010]. The procedure consist on modifying the statistical models to include traits that effectively make them *robust* to small departures from the distributional assumption, like heteroscedastic errors, or to the presence of *outliers*.::: {.column-margin}**_Robust regression models_**A general class of statistical procedures designed to reduce the sensitivity of the parameter estimates to mild or moderate failures in the assumption of a model [@Everitt_et_al_2010].:::### Importance {#sec-whyrobust}It is known that dealing with *heteroscedasticity* and the identification of *outlier* through preliminary univariate procedures is prone to the erroneous transformation or exclusion of valuable information. This can ultimately *bias* the parameter estimates, and even make them inefficient [@McElreath_2020]. *Bias* refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated [@Everitt_et_al_2010].::: {.column-margin}**_Bias_**It refer to the extent to which the statistical method used in a study does not estimate the quantity thought to be estimated [@Everitt_et_al_2010].:::Dealing with the possibility of heteroscedasticity or outlying observations is relevant to the present study, because there is an interest in testing hypotheses about the potential intelligibility of speakers. Therefore, it is a necessity to considering the possibility of using robust regression models to assess these distributional departures and generate unbiased parameter estimates.# Research gaps {#sec-researchG}The process of decoding the words in a message, known as *intelligibility*, is crucial for understanding the intended meaning or purpose of a message. *Speech intelligibility* refers to the extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words [@Freeman_et_al_2017; @vanHeuven_2008; @Whitehill_et_al_2004]. ::: {.column-margin}**_Speech intelligibility_**The extent to which a listener can accurately recover the elements in an acoustic signal produced by a speaker, such as phonemes or words [@Freeman_et_al_2017; @vanHeuven_2008; @Whitehill_et_al_2004].:::In the research of *speech intelligibility*, studies that utilized entropy scores have extensively explored the statistical modeling of data clustering. For instance, Boonen et al. [-@Boonen_et_al_2021] addressed the data clustering of entropy by employing LMM (refer to @sec-LMM). However, to the best of the authors' knowledge, no prior research has specifically addressed the modeling of the bounded nature of the entropy data. @sec-whybeta and @sec-whyGLMM present compelling arguments on the critical importance of considering both in the statistical modeling of data.Furthermore, no study has attempted to construct a *latent variable* that unveils the speaker's (underlying) *potential intelligibility*. @sec-whylatent elaborates on the importance of consider measurement error in statistical models, and the role of latent variables to accomplish such task.Hence, this study aims to fill these gaps by introducing the *Bayesian beta-proportion linear, latent and mixed model*, a type of *generalized linear latent and mixed model (GLLAMM)* [@Skrondal_et_al_2004a; @Rabe_et_al_2004a; @Rabe_et_al_2004c; @Rabe_et_al_2004b]. @sec-whybayesian elaborates on the relevance of Bayesian inference procedures for this study.# Research questions {#sec-researchQ}This study aims to address three research questions that provide insights into modeling speech intelligibility using clustered and bounded entropy data.**_[Research question 1]:_** Does the *Bayesian beta-proportion linear, latent and mixed model* accounts for all the observed variability in the data and yield better predictions?. More specifically, the study seeks to determine whether the model can improve the predictions of entropy data at different clustering levels (i.e., word, sentence, and speaker level) compared to the ubiquitous normal distributed model. **_[Research question 2]:_** Can a measure of a speaker's *potential intelligibility* be constructed?. In this regard, the study will leverage the principles laid down in @sec-whylatent, to aggregate listener assessment in the form of (manifest) entropy scores into a *latent variable* that might unveil the speaker's (latent) *potential intelligibility*.**_[Research question 3]:_** Can the speaker's potential intelligibility be used to test specific *research hypotheses*?. Particularly, the study aims to demonstrate how the new analytic paradigm could be employed to examine whether child-related factors, such as children’s chronological age and hearing status, contribute to intelligibility.# Data {#sec-data}The data comprised the transcriptions of children's spontaneous speech samples originally collected by Boonen et al. [-@Boonen_et_al_2021]. The analysis code is illustrated with the dataset, enabling readers to verify the correctness of the implementation. However, the data is not publicly available due privacy restrictions. Nonetheless, the data can provided by the corresponding author upon reasonable request.## Transcription task {#sec-transcriptions}For the transcription task, $105$ listeners and $320$ sentences, with a maximum of $11$ words per sentence (*speech samples*), were randomly assigned to five blocks. Each block consisted of approximately $21$ listeners who transcribed $64$ sentences presented in a random order. As a result, a total of $47,514$ transcribed words were generated from the original $2,263$ words present in the speech samples.::: {.column-margin}**_Speech samples_**Sentences with a minimum of $3$ and a maximum of $11$ words per sentence.:::## Entropy calculation {#sec-entropy_calculation}The $47,514$ orthographic transcriptions were automatically aligned with a python script at the sentence level, in a column-like grid structure similar to the one presented in @tbl-alignment. This alignment process was repeated for each sentence within each speaker and block, and the output was manually checked and adjusted (if needed) in order to appropriately align the words. For more details on the alignment procedure refer to Boonen et al. [-@Boonen_et_al_2021].Next, the aligned transcriptions were aggregated by listener, yielding $2,263$ entropy scores, one score per word. The entropy scores were calculated following Shannon’s entropy formula [-@Shannon_1948]:::: {.column-margin}**_Entropy formula_**:::$$ \begin{equation}H_{wsib} = H( \boldsymbol{p} ) = \frac{ \sum_{k=1}^{K} p_{k} \cdot log_{2}(p_{k}) }{ log_{2}(J)}\end{equation}$$ {#eq-entropy}where $H_{wsib}$ denotes the entropy score bounded within the continuous interval between zero and one $\left( H_{wsib} \in [0,1] \right)$, with $w$ denoting the word number, $s$ the sentence number, $i$ the speaker number, and $b$ the block number. Moreover, $K$ describes the number of different word types within transcriptions, $p_{k}$ denotes the proportion of word types within transcriptions, and $J$ defines the total number of word transcriptions. Notice that by design, the total number of word transcriptions $J$ corresponds with the number of listeners per block, i.e., $21$ listeners. Furthermore, $p_{k}$ is calculated as follows: $$ \begin{equation}p_{k} = \frac{ \sum_{j=1}^{J} 1(T_{jk}) }{ J } \end{equation}$$ {#eq-prop_wordtype}where $1(T_{jk})$ denotes an indicator function that takes the value of one when the word type $k$ is present in the transcription $j$.The resulting entropy scores served as the outcome variable, capturing the agreement or disagreement among listeners’ word transcriptions. High degree of agreement between transcriptions indicate higher *intelligibility*, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower *intelligibility*, and yield higher entropy scores. [@Boonen_et_al_2021; @Faes_et_al_2021].::: {.column-margin}**_Entropy interpretation_**High degree of agreement between transcriptions indicate higher *intelligibility*, and yield lower entropy scores. Low degree of agreement between transcriptions indicate lower *intelligibility*, and yield higher entropy scores. [@Boonen_et_al_2021; @Faes_et_al_2021]:::<!-- It is worth noting that in the current study, the word-level entropy scores were used as the outcome variable. This distinction sets it apart from the investigation conducted by Boonen et al. [-@Boonen_et_al_2021], where the authors utilized the average entropy score at the sentence level, obtained by averaging the word-level entropy scores for each sentence within each speaker. -->+---------------+---------+---------+---------+---------+---------+| Transcription | Words | | | | |+---------------+---------+---------+---------+---------+---------+| Number | 1 | 2 | 3 | 4 | 5 |+:=============:+:=======:+:=======:+:=======:+:=======:+:=======:+| 1 | de | jongen | ziet | een | kikker |+---------------+---------+---------+---------+---------+---------+| | the | boy | sees | a | frog |+---------------+---------+---------+---------+---------+---------+| 2 | de | jongen | ziet | de | [X] |+---------------+---------+---------+---------+---------+---------+| | the | boy | sees | the | [X] |+---------------+---------+---------+---------+---------+---------+| 3 | de | jongen | zag | [B] | kokkin |+---------------+---------+---------+---------+---------+---------+| | the | boy | saw | [B] | cook |+---------------+---------+---------+---------+---------+---------+| 4 | de | jongen | zag | geen | kikkers |+---------------+---------+---------+---------+---------+---------+| | the | boy | saw | no | frogs |+---------------+---------+---------+---------+---------+---------+| 5 | de | hond | zoekt | een | [X] |+---------------+---------+---------+---------+---------+---------+| | the | dog | searches| a | [X] |+---------------+---------+---------+---------+---------+---------+| **Entropy** | $0$ |$0.3109$ |$0.6555$ |$0.8277$ |$1$ |+---------------+---------+---------+---------+---------+---------+: Hypothetical alignment of word transcriptions and entropy score calculations for the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners $\left( s=1, i=1, b=1, J=5 \right)$. *[B]* represent a blank space, and *[X]* an unidentifiable speech. Extracted from Boonen et al. [-@Boonen_et_al_2021], and slightly modified for illustrative purposes. {#tbl-alignment .striped .hover}It is relevant to exemplify the entropy calculation procedure. For that purpose, the words in position two, four and five observed in @tbl-alignment were used. These words were assumed present in the first sentence, produced by the first speaker assigned to the first block, and transcribed by five listeners ($w=\{2,4,5\}$, $s=1$, $i=1$, $b=1$, $J=5$). For the second word, the first four listeners identified the word type *jongen* $(T_{j1})$, while the last identified the word type *hond* $(T_{j2})$. Therefore, two word types were identified ($K=2$), with proportions equal to $\{ p_{1}, p_{2} \} = \{ 4/5, 1/5 \} = \{ 0.8, 0.2 \}$, and entropy score equal to:$$ H_{2111} = \frac{ 0.8 \cdot log_{2}(0.8) + 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.3109$$For the fourth word, two listeners identified the word type *een* $(T_{j1})$, one listener the word type *de* $(T_{j2})$, and another the word *geen* $(T_{j3})$. It is important to highlight the presence of a blank space *[B]* in transcription number three. A blank space *[B]* is a symbol that defines the absence of a word in a space were a word was expected, as compared with other transcriptions. Notice that for calculation purposes, because the blank space was not expected in position three it was considered as a different word type $(T_{j4})$. Therefore four word types were registered ($K=4$), with proportions equal to $\{ p_{1}, p_{2}, p_{3}, p_{4} \} = \{ 2/5, 1/5, 1/5, 1/5 \} = \{ 0.4, 0.2, 0.2, 0.2 \}$ and entropy score equal to:$$ H_{4111} = \frac{ 0.4 \cdot log_{2}(0.4) + 3 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} \approx 0.8277$$Lastly, for the fifth word, each listener transcribed a different word. It is important to highlight that when a listener did not managed to identify a complete word, or part of it, (s)he was instructed to write *[X]* in that position. However, for the calculation of the entropy score, if more than one listener marked an unidentifiable word with *[X]*, each one of them was considered a different word type. This was done to avoid the artificial reduction of the entropy score, as *[X]* values already indicate the lack of the word's intelligibility. Consequently, five word types were observed, $T_{j1}=$*kikker*, $T_{j2}=$*[X]*, $T_{j3}=$*kokkin*, $T_{j4}=$*kikkers*, $T_{j5}=$*[X]* ($K=5$), with proportions equal to $\{ p_{1}, p_{2}, p_{3}, p_{4}, p_{5} \} = \{ 1/5, 1/5, 1/5, 1/5, 1/5 \} = \{ 0.2, 0.2, 0.2, 0.2, 0.2 \}$, and entropy score equal to:$$ H_{5111} = \frac{ 5 \cdot 0.2 \cdot log_{2}(0.2) }{ log_{2}(5)} = 1$$## Exploring the data {#sec-data_exploration}```{r}#| label: code-data_load#| fig-cap: ''#| echo: false#| warning: falsedata_dir =file.path('/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/#data/final')data_H =read.csv( file.path(data_dir, 'data_H_frame.csv') ) # str(data_H)# transformationdata_H$HS =factor( data_H$hearing_status )data_H$A = data_H$chronological_agedata_H$Am = data_H$chronological_age -min(data_H$chronological_age)data_H$child =factor(data_H$childID)```As expected, the data exploration reveals two significant features of the entropy scores: *clustering* and *boundedness* (refer to @sec-whybeta and @sec-whyGLMM). In the case of the entropy scores, clustering arises due to the presence of various word-level scores generated for numerous sentences, originated from different speakers and evaluated in different blocks (see code output below, depicting the first ten observations of the data). On the other hand, entropy scores exhibit *boundedness* as they can only take on values within the continuous interval between zero and one, particularly $H_{wsib} \in [0,1]$ (see @fig-entropy_data showing three randomly selected children).```{r}#| label: code-dataframe#| fig-cap: ''#| echo: true#| warning: falsevar_int =c('block','childID','sentence','word','HS','A','Am','Hwsib') # <1>head( data_H[, var_int], 10 ) # <2>```1. selecting variables of interest2. showing the first 10 observations of the data```{r}#| label: fig-entropy_data#| fig-cap: 'Entropy scores distribution: all sentences of selected children'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>set.seed( 12345 ) # <2>children =sample( 1:32, size=3, replace=F ) par(mfrow=c(1,3))for( i in children ){ # <3> idx_child = data_H$childID == ihist( data_H$Hwsib[idx_child], xlab='', xlim=c(-0.2,1.2), breaks=15, col=rgb(0,0,0,alpha=0.2),main=paste0('Child ', i, ', all sentences') )abline( v=c(0,1), col='gray', lty=2 )}par(mfrow=c(1,1))```1. package requirement2. seed for replication and random sample of three children3. histogram plot for all sentences of childIDAdditionally, the data shows the $320$ children's speech samples consists of sentences with a minimum of $3$ and a maximum of $11$ words per sentence ($\mu=7.07$, $\sigma=1.06$), where most of the speech samples have between $5$ and $9$ words per sentence (see @fig-speech_samples). <!-- The minimum is observed in the third sentence for child $6$ and the maximum in the second sentence for child $9$, respectively. -->```{r}#| label: code-speech_samples1#| fig-cap: ''#| echo: true#| warning: falsespeech_samples =with(data_H, table(childID, sentence) ) # <1>speech_samples```1. report speech samples per child and sentence ```{r}#| label: fig-speech_samples#| fig-cap: 'Histogram of words per sentences in the speech samples'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 5speech_samples =data.frame( speech_samples )hist(speech_samples$Freq, breaks=20, xlim=c(2, 12), # <1>main='', xlab='words per sentence')```1. histogram of words per sentences```{r}#| label: code-speech_samples2#| fig-cap: ''#| echo: true#| warning: falsepsych::describe( speech_samples$Freq ) # <1>```1. statistical descriptors for the speech samplesMoreover, the data comprised $16$ normal hearing children (*NH*, hearing status category $1$) and $16$ hearing impaired children, with cochlear implant (*HI/CI*, hearing status category $2$). Most of the *NH* and *HI/CI* children had $82$ and $85$ months of age, respectively. Additionally, the minimum chronological age registered in the sample is $68$ months.```{r}#| label: code-A_HS#| fig-cap: ''#| echo: true#| warning: falsed_mom =unique( data_H[,c('childID','HS','A')]) # <1>with( d_mom, table( A, HS ) ) # <2>```1. unique hearing status and chronological age per child2. number of children per chronological age and hearing statusAfter examining the data, it is emphasized that no observations were excluded from the modeling process based on preliminary or univariate procedures. As it is detailed in @sec-interlude5, the identification of *outliers*, through univariate procedures is prone to the erroneous exclusion of valuable information, which can ultimately bias the parameter estimates. Consequently, their identification was performed within the context of the proposed models, as recommended by McElreath [-@McElreath_2020].Lastly, before fitting the models using Bayesian inference, the data was formatted as a list including all necessary information for the fitting process:```{r}#| label: code-datalist#| fig-cap: ''#| echo: true#| warning: falsedlist =list(N =nrow(data_H), # <1>B =max(data_H$block), # <2>I =max(data_H$childID), # <3>U =max(data_H$sentence), # <4>W =max(data_H$word), # <5>cHS =max(data_H$hearing_status), # <6>bid = data_H$block, # <7>cid = data_H$childID, # <8>uid = data_H$sentence, # <9>HS = data_H$hearing_status, # <10>Am =with( data_H, chronological_age -min(chronological_age) ), # <11>Hwsib = data_H$Hwsib # <12>)str(dlist)```1. Number of observations2. Maximum number of blocks3. Maximum number of children4. Maximum number of sentences5. Maximum number of words6. Maximum number of categories in hearing status7. Observation block ID8. Observation child ID9. Observation sentence ID10. Observation hearing status11. Observation chronological age12. Observation entropy score# Methods {#sec-methods}This subsection presents the probabilistic formalism for the statistical models utilized in the current study. It also details the estimation procedure employed to fit the models, and the criteria used to asses the quality of the Bayesian inference results. Lastly, it presents the set of fitted models, and the methodology for selecting among them.The Bayesian inference procedures are elaborated following closely the *When-to-Worry-and-How-to-Avoid-the-Misuse-of-Bayesian-Statistics checklist (WAMBS checklist)* [@Depaoli_et_al_2017]. The *WAMBS checklist* is a questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis.::: {.column-margin}**_WAMBS checklist_** Questionnaire designed to outline the ten main points that should be meticulously examined when employing Bayesian inference procedures, with the ultimate goal of enhancing the transparency and replicability of the analysis [@Depaoli_et_al_2017].:::## Models {#sec-models}### Normal GLLAMM {#sec-normal_GLLAMM}A *generalized linear, latent and mixed model (GLLAMM)* [@Skrondal_et_al_2004a; @Rabe_et_al_2004a; @Rabe_et_al_2004c; @Rabe_et_al_2004b] is a unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between *latent variables* and one or more covariates. Under this framework, the normal linear, latent and mixed model is composed of two parts: a response and a structural equation part. The probabilistic representation of the response part can be stated as follows:::: {.column-margin}**_Generalized linear, latent and mixed model (GLLAMM)_**Unifying framework used to estimate (non)linear relationship between the mean of (non)normally distributed manifest variables with clustered observations, while also estimating relationships between *latent variables* and one or more covariates [@Skrondal_et_al_2004a; @Rabe_et_al_2004a; @Rabe_et_al_2004c; @Rabe_et_al_2004b].:::$$\begin{align}H_{wsib} & \sim \text{Normal} \left( \mu_{ib}, \sigma_{i} \right) \\\mu_{ib} &= a_{b} - SI_{i}\end{align} $$ {#eq-normal_GLLAM_response}where $H_{wsib}$ denotes the manifest word-level entropy scores, for sentence $s$ and child $i$ in block $b$. Furthermore, the $\mu_{ib}$ denotes the mean of the outcome and $\sigma_{i}$ its standard deviation. Additionally, $a_{b}$ denotes the block random effect, and $SI_{i}$ denotes the speaker's (latent) *potential intelligibility* score. On the other hand, the structural equation part, which relates the children's potential intelligibility to the child related factors, can be represented probabilistically in the following manner:$$\begin{align}SI_{i} = \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i}\end{align} $$ {#eq-normal_GLLAM_structural}where $HS_{i}$ defines the hearing status and $A_{i}$ the chronological age in months, of children $i$. $\alpha$ defines the general intercept, and $\alpha_{HS[i]}$ denotes the potential intelligibility for different hearing status groups. Moreover, $\beta_{A,HS[i]}$ denotes the evolution of potential intelligibility per unit of chronological age for each hearing status group. Lastly, $e_{i}$ denotes the unexplained potential intelligibility differences between children, and $u_{i}$ the average unexplained potential intelligibility variability among sentences for each children, defined as $u_{i} = \sum_{s=1}^{S} u_{si}/S$ with $S$ denoting the total number of sentences per child.Five features can be noticed in the previous probabilistic representations. First, the full definition of the model shows four of the five components of a Bayesian GLLAMM: the response model, with its likelihood component, linear predictor and link function; and the structural equation part. The fifth component, priors and hyperpriors, is fully developed in @sec-priors. @eq-normal_GLLAM_response reveals that for the likelihood component of the response model, the outcome is assumed to be normally distributed with mean $\mu_{ib}$ and standard deviation $\sigma_{i}$. Moreover, The linear predictor of the response model is a linear combination of the parameters $a_{b}$ and $SI_{i}$ with a direct link function, implying the absence of a transformation of the linear predictor. Lastly, @eq-normal_GLLAM_structural shows that the structural equation part relates the potential intelligibility with the child related factors.::: {.column-margin}**_GLLAMM components_**1. Response model, random component2. Response model, linear predictor3. Response model, link function4. Structural equation model5. Priors and Hyperpriors:::Second, @eq-normal_GLLAM_response reveals the response model representation can consider different standard deviation per child (indicated by the subscript $i$). These trait effectively make the regression models *robust* to small departures from the normal distributional assumption or to the presence of *outliers* (refer to @sec-interlude5).Third, @eq-normal_GLLAM_response also shows that in the response model representation, the potential intelligibility has a negative linear relationship with the average entropy scores. This feature makes explicit the inverse relationship between the potential intelligibility and entropy scores detailed in @sec-entropy_calculation.Fourth, @eq-normal_GLLAM_structural shows the chronological age is *centered* around a minimum age $\bar{A}$. The *centering* procedure is done to facilitate the interpretation of the parameters [@Everitt_et_al_2010]. Specifically, to the centering is done avoid interpreting the parameter estimates outside the range of the chronological ages available in the data. ::: {.column-margin}**_Centering_** Procedure use to facilitate the interpretation of regression parameters [@Everitt_et_al_2010].:::Fifth and last, @eq-normal_GLLAM_structural shows the model assumes the potential intelligibility has two sources of unexplained variability. One from the specific selection of children $e_{i}$, and another from the specific selection of sentences for each children $u_{i}$. While $e_{i}$ assumes that different children inherently have different levels of potential intelligibility, $u_{i}$ assumes that different sentences measure differently the potential intelligibility of children. The latter is assumed to be due to the different word difficulties and their interplay among each other within the sentence.### Beta-proportion GLLAMM {#sec-beta_GLLAMM}In a similar fashion, the proposed beta-proportion linear, latent and mixed model was composed of two parts: a response and a structural part. The response part assumed the following probabilistic structure:$$\begin{align}H_{wsib} & \sim \text{BetaProp} \left( \mu_{ib}, M_{i} \right) \\\mu_{ib} &= \text{logit}^{-1}( a_{b} - SI_{i} )\end{align} $$ {#eq-beta_GLLAM_response}On the other hand, the structural equation part is defined as in @eq-normal_GLLAM_response, i.e., in the same way as in the normal linear, latent and mixed model. Similar to the normal GLLAMM, the probabilistic representation of the beta-proportion GLLAMM has three main features. First, the representation also highlight four of the five components of a GLLAMM. For the likelihood component of the response model, it is assumed the outcome is beta-proportional distributed with mean $\mu_{ib}$ and sample size $M_{i}$. The linear predictor of the response model is a linear combination of the parameters $a_{b}$ and $SI_{i}$, with an inverse-logit link function $\text{logit}^{-1}(x) = exp(x) / (1+exp(x))$. Lastly, the structural equation part also relates the children's potential intelligibility with the child related factors.Second, @eq-beta_GLLAM_response reveals the response model representation also considers different sample sizes per child (indicated by the subscript in $M_{i}$). As in the normal GLLAMM, models with this trait were dubbed *robust regression models* (refer to @sec-interlude5).Third and last, equation @eq-beta_GLLAM_response shows in the response model representation, the children’s potential intelligibility has a negative (non)linear relationship with the entropy scores. This feature still makes explicit the inverse relationship between the children’s potential intelligibility and entropy scores, as seen in @sec-entropy_calculation.## Prior distributions {#sec-priors}In this section, *prior predictive simulations* are conducted for all parameters that necessitate prior assumptions. *Prior predictive simulation* is a procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data [@McElreath_2020].::: {.column-margin}**_Prior predictive simulations_** Procedure employed to generate data based on the prior distributions. The purpose of the procedure is to assign sensible priors and comprehend their implications in the context of a model, before integrating any information derived from empirical data [@McElreath_2020].:::The procedure involved simulating parameters independently in either the structural or response part of the models. These parameters were later transformed into the entropy score scale through the response part. The specific steps employed are found in the adjacent code within the parameter's corresponding sections [^5].[^5]: An interested reader can refer to McElreath [-@McElreath_2020] for a detailed explanation of prior predictive simulations.The probabilistic representations for the normal GLLAMM show that seven parameters require assumption of a prior distribution (refer to @sec-normal_GLLAMM). Similarly, the representation for the beta-proportion GLLAMM requires an additional seven prior distribution assumptions (refer to @sec-beta_GLLAMM).The parameters found in the structural part of both models can be enumerated as follows: 1. the general intercept $\alpha$,2. the hearing status effects $\alpha_{HS[i]}$, 3. the chronological age per hearing status effects $\beta_{A,HS[i]}$, 4. the child variability $e_{i}$, 5. the child-sentence average variability $u_{i}$ Furthermore, the parameters present in the response part of either model are:1. the random block effect $a_{b}$, present in both models,2. the standard deviation $\sigma_{i}$, present only in the normal GLLAMMM, and 3. the sample size $M_{i}$, present only in the beta-proportion GLLAMM,Lastly, after the establishment of all the prior distributions, a prior predictive simulations are also provided for the children potential intelligibility $SI_{i}$.### Intercepts $\alpha$The parameter's prior distribution for the normal and beta-proportion GLLAMM is described in @eq-alpha_prior. The distribution serves as an informative prior, which aims to set the scale for the speaker's (latent) potential intelligibility, a requirement often seen in latent variable models [@Depaoli_2021] (refer to @sec-prior_effects). $$\begin{align}\alpha &\sim \text{Normal} \left( 0, 0.05 \right)\end{align} $$ {#eq-alpha_prior}The left panel of @fig-alpha_prior_betaGLLAMM shows the prior restricts $\alpha$ to be mostly between the range of $[-0.1, 0.1]$. The implications for the scale of potential intelligibility are: (1) no particular bias towards a positive or negative intercept is evident, and (2) the scale is initially centered around zero with high certainty. On the other hand, the right panel of @fig-alpha_prior_normalGLLAMM demonstrate that when the $\alpha$ prior is transformed to the entropy scale under the normal GLLAMM model, the model anticipates a concentration of data around lower levels of entropy, and beyond the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.```{r}#| label: fig-alpha_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, general intercept prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.05 ) # <3> param_oscale =-1*param_pscale # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <5>show.HPDI=0.95,main='Intelligibility scale',xlab=expression(alpha))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Normal GLLAMM general intercept prior distribution definition: intelligibility scale4. Normal GLLAMM general intercept prior distribution definition: entropy scale5. Normal GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale6. Normal GLLAMM general intercept prior distribution density plot: bounded entropy scaleIn contrast, the right panel of @fig-alpha_prior_betaGLLAMM, demonstrate that when the $\alpha$ prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores among the hearing status groups are expected by the model.```{r}#| label: fig-alpha_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, general intercept prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.05 ) # <3> param_oscale =inv_logit( -1*param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <5>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(alpha))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Beta-proportion GLLAMM general intercept prior distribution definition: intelligibility scale4. Beta-proportion GLLAMM general intercept prior distribution definition: entropy scale5. Beta-proportion GLLAMM general intercept prior distribution density plot: unrestricted intelligibility scale6. Beta-proportion GLLAMM general intercept prior distribution density plot: bounded entropy scale### Hearing status effects $\alpha_{HS[i]}$The prior distribution of the parameters for the normal and beta-proportion are described in @eq-alphaHS_prior. For each of the two hearing status categories present in the data there is one prior (refer to @sec-data_exploration). Notably, the same prior is applied to both categories. This choice implies that the parameters for each category are presumed to have similar uncertainties prior to the observation of empirical data.$$\begin{align}\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right)\end{align} $$ {#eq-alphaHS_prior}The left panel of @fig-alphaHS_prior_normalGLLAMM reveal a weakly informative prior that restricts the range of probability of $\alpha_{HS[i]}$ only between $[-0.6, 0.6]$ (refer to @sec-prior_effects). This implies that no particular bias towards positive or negative values for the potential intelligibility of different hearing status groups is present in the priors.However, the right panel of @fig-alphaHS_prior_normalGLLAMM demonstrate that when the prior of $\alpha_{HS[i]}$ is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy, and outside the feasible range of the outcome. This prior might seem inappropriate at first glance, but other priors hide even more undesirable assumption.```{r}#| label: fig-alphaHS_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.3 ) # <3> param_oscale =-1*param_pscale # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <5>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(alpha[HS]))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Normal GLLAMM hearing status effects prior distribution definition: intelligibility scale4. Normal GLLAMM hearing status effects prior distribution definition: entropy scale5. Normal GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale6. Normal GLLAMM hearing status effects prior distribution density plot: bounded entropy scaleConversely, the right panel of @fig-alphaHS_prior_betaGLLAMM, demonstrate that when the $\alpha_{HS[i]}$ prior is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a concentration of data around mid levels of entropy, and not beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative entropy scores are expected by the model.```{r}#| label: fig-alphaHS_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, hearing status effects prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.5 ) # <3> param_oscale =inv_logit( -1*param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <5>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(alpha[HS]))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Beta-proportion GLLAMM hearing status effects prior distribution definition: intelligibility scale4. Beta-proportion GLLAMM hearing status effects prior distribution definition: entropy scale5. Beta-proportion GLLAMM hearing status effects prior distribution density plot: unrestricted intelligibility scale6. Beta-proportion GLLAMM hearing status effects prior distribution density plot: bounded entropy scale### Chronological age per hearing status $\beta_{A,HS[i]}$The prior distribution of $\beta_{A,HS[i]}$ for the normal and beta-proportion GLLAMM is described in @eq-betaHS_prior. For each of the two hearing status categories present in the data there is one prior (refer to @sec-data_exploration). Notably, the same prior is applied to both categories. This choice implies that the evolution of potential intelligibility attributed to chronological age between the categories is presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider the interaction of chronological age and hearing status ($\beta_{A}$ instead of $\beta_{A,HS[i]}$), the same prior was utilized (refer to @sec-fitted).$$\begin{align}\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right)\end{align} $$ {#eq-betaHS_prior}The left panel of @fig-betaHS_prior_normalGLLAMM shows a weakly informative prior that restricts $\beta_{A,HS[i]}$ to be within the range of $[-0.2, 0.2]$. This implies that no particular bias towards positive or negative effects is present for the evolution of potential intelligibility due to chronological age per hearing status group. The right panel of @fig-betaHS_prior_normalGLLAMM, on the other hand, demonstrate that when the prior of $\beta_{A,HS[i]}$ is transformed to the entropy scale, the model anticipates a concentration of lower entropy values. More importantly, the model expects entropy scores beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.```{r}#| label: fig-betaHS_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, chonological age per hearing status effects prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.1 ) # <3>usd =function(i){ param_pscale * data_H$Am[i] } # <4>param_mscale =sapply( 1:length(data_H$Am), usd ) # <5>param_oscale =-1*param_mscale # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(beta[AHS]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Normal GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale4. user defined function: effects per chronological age5. Normal GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale6. Normal GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale7. Normal GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale8. Normal GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scaleIn contrast, the right panel of @fig-betaHS_prior_betaGLLAMM, demonstrate that when the prior of $\beta_{A,HS[i]}$ is transformed to the entropy scale, the model anticipates a slight concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative effects for the evolution of potential intelligibility per hearing status group is present.```{r}#| label: fig-betaHS_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, chronological age per hearing status effects prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rnorm( n=n, mean=0, sd=0.1 ) # <3>usd =function(i){ param_pscale * data_H$Am[i] } # <4>param_mscale =sapply( 1:length(data_H$Am), usd ) # <5>param_oscale =inv_logit( -1*param_mscale ) # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(beta[AHS]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: intelligibility scale4. user defined function: effects per chronological age5. Beta-proportion GLLAMM chronological age per hearing status effects prior distribution calculation: intelligibility scale6. Beta-proportion GLLAMM chronological age per hearing status effects prior distribution definition: entropy scale7. Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: unrestricted intelligibility scale8. Beta-proportion GLLAMM chronological age per hearing status effects prior distribution density plot: bounded entropy scale### Child differences $e_{i}$The parameter $e_{i}$ was defined in terms of hyperparameters (refer to @sec-hyperpriors). Specifically, the prior distribution of $e_{i}$ for the normal and beta-proportion GLLAMM is described in @eq-ei_prior. For each of the children present in the data there is one prior (refer to @sec-data_exploration). Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that potential intelligibility differences between children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters.$$\begin{align}m_{i} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{i} &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m_{i}, s_{i} \right)\end{align} $$ {#eq-ei_prior}The left panel of @fig-ei_prior_normalGLLAMM shows a weakly informative prior that restricts the potential intelligibility differences to be in a range of $[-1.5, 1.5]$. This implies that differences in potential intelligibility between children can be as large as three units of measurement. However, the right panel of @fig-ei_prior_normalGLLAMM demonstrate that when the prior of $e_{i}$ is transformed to the entropy scale under the normal GLLAMM, the model anticipates again a concentration of data around lower levels of entropy, and even it expects the differences to go beyond the feasible range of the outcome. This assumption may initially appear inappropriate, however, considering again the probabilistic representation of the model, other priors hide even more undesirable assumptions.```{r}#| label: fig-ei_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, children differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =rnorm( n=n, mean=m_i, sd=s_i ) # <5>param_oscale =-1*param_pscale # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(e[i]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Normal GLLAMM child differences prior distribution calculation: intelligibility scale6. Normal GLLAMM child differences prior distribution definition: entropy scale7. Normal GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale8. Normal GLLAMM child differences prior distribution density plot: bounded entropy scaleConversely, the right panel of @fig-ei_prior_betaGLLAMM, demonstrate that when the prior of $e_{i}$ is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences in potential intelligibility between children are expected at the entropy scale level.```{r}#| label: fig-ei_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, children differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =rnorm( n=n, mean=m_i, sd=s_i ) # <5>param_oscale =inv_logit( -1*param_pscale ) # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(e[i]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Beta-proportion GLLAMM child differences prior distribution calculation: intelligibility scale6. Beta-proportion GLLAMM child differences prior distribution definition: entropy scale7. Beta-proportion GLLAMM child differences prior distribution density plot: unrestricted intelligibility scale8. Beta-proportion GLLAMM child differences prior distribution density plot: bounded entropy scale### Sentence-child average differences $u_{i}$The parameter $u_{i}$ was defined in terms of hyperparameters (refer to @sec-hyperpriors). Specifically, the prior distribution of $u_{i}$ for the normal and beta-proportion GLLAMM is described in @eq-usi_prior. For each sentence within children present in the data there is one prior, which are later aggregated across all sentences $S$, as defined in @sec-normal_GLLAMM. Notably, the same prior is applied to all children with a common mean and standard deviation defined by the hyperparameters. This choice implies that the average potential intelligibility differences among sentences within children are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. $$\begin{align}m_{u} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{u} &\sim \text{Exponential} \left( 2 \right) \\u_{si} &\sim \text{Normal} \left( m_{u}, s_{u} \right) \\u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S}\end{align} $$ {#eq-usi_prior}The left panel of @fig-ui_prior_normalGLLAMM shows a weakly informative prior that restricts the $u_{i}$ to be in a range between $[-0.5, 0.5]$. This implies that average differences in potential intelligibility among sentences within children can be as large as one units of measurement. The right panel of @fig-ui_prior_normalGLLAMM, in contrast, demonstrate that when the prior of $u_{i}$ is transformed to the entropy scale, the model anticipates a concentration of data around lower levels of entropy. More importantly, the model expects the differences to go beyond the feasible range of the outcome. Again, this prior may initially appear inappropriate but other priors hide even more undesirable assumptions.```{r}#| label: fig-ui_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =replicate( # <5>n=n, expr=mean( rnorm( n=10, mean=m_i, sd=s_i ) ) ) param_oscale =-1*param_pscale # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(u[i]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Normal GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale6. Normal GLLAMM sentence-child average differences prior distribution definition: entropy scale7. Normal GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale8. Normal GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale, Conversely, the right panel of @fig-ui_prior_betaGLLAMM, demonstrate that when the prior of $u_{i}$ is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy. However, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between children are expected at the entropy scale level.```{r}#| label: fig-ui_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, sentence-children average differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =replicate( # <5>n=n, expr=mean( rnorm( n=10, mean=m_i, sd=s_i ) ) ) param_oscale =inv_logit( -1*param_pscale ) # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-1,1), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(u[i]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Beta-proportion GLLAMM sentence-child average differences prior distribution calculation: intelligibility scale6. Beta-proportion GLLAMM sentence-child average differences prior distribution definition: entropy scale7. Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: unrestricted intelligibility scale8. Beta-proportion GLLAMM sentence-child average differences prior distribution density plot: bounded entropy scale### Random block effect $a_{b}$The parameter $a_{b}$ was also defined in terms of hyperparameters (refer to @sec-hyperpriors). Specifically, the prior distribution of $a_{b}$ for the normal and beta-proportion GLLAMM is described in @eq-ab_prior. For each block present in the data there is one prior (refer to @sec-data_exploration). The same prior is applied to all blocks with a common mean and standard deviation defined by the hyperparameters. This choice implies that block differences are presumed to have similar uncertainties prior to the observation of empirical data, and that these are governed by a set of common parameters. $$\begin{align}m_{b} &\sim \text{Normal} \left( 0, 0.05 \right) \\s_{b} &\sim \text{Exponential} \left( 2 \right) \\a_{b} &\sim \text{Normal} \left( m_{b}, s_{b} \right)\end{align} $$ {#eq-ab_prior}The left panel of @fig-ab_prior_normalGLLAMM shows a weakly informative prior restricting $a_{b}$ to be in a range between $[-1.5, 1.5]$, implying no particular bias towards positive or negative differences between blocks are present in the priors.Nevertheless, the right panel of @fig-ab_prior_normalGLLAMM demonstrate that when the prior of $a_{b}$ is transformed to the entropy scale under the normal GLLAMM, the model anticipates a concentration of data around lower levels of entropy. Furthermore, the model expects the differences to go beyond the feasible range of the outcome.```{r}#| label: fig-ab_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, block differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =rnorm( n=n, mean=m_i, sd=s_i ) # <5>param_oscale =-1*param_pscale # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(u[si]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Normal GLLAMM block differences prior distribution calculation: intelligibility scale6. Normal GLLAMM block differences prior distribution definition: entropy scale7. Normal GLLAMM block differences prior distribution density plot: unrestricted intelligibility scale8. Normal GLLAMM block differences prior distribution density plot: bounded entropy scaleIn contrast, the right panel of @fig-ab_prior_betaGLLAMM, demonstrate that when the prior of $a_{b}$ is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data around mid levels of entropy, but more importantly, it does not expect data beyond the feasible range of the outcome. This implies that no particular bias towards positive or negative differences between blocks are expected at the entropy scale level.```{r}#| label: fig-ab_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, block differences prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>m_i =rnorm( n=n, mean=0, sd=0.05 ) # <3>s_i =rexp(n=n, rate=2 ) # <4>param_pscale =rnorm( n=n, mean=m_i, sd=s_i ) # <5>param_oscale =inv_logit( -1*param_pscale ) # <6>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-2,2), # <7>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(u[si]))dens( param_oscale, xlim=c(0,1), # <8>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. mean hyperparameter prior distribution definition4. standard deviation hyperparameter prior distribution definition5. Beta-proportion GLLAMM block differences prior distribution calculation: intelligibility scale6. Beta-proportion GLLAMM block differences prior distribution definition: entropy scale7. Beta-proportion GLLAMM block differences prior distribution density plot: unrestricted intelligibility scale8. Beta-proportion GLLAMM block differences prior distribution density plot: bounded entropy scale### Standard deviation $\sigma_{i}$The prior distribution of $\sigma_{i}$ for the normal GLLAMM is described in @eq-normal_GLLAM_sigma_prior. For each of the children present in the data there is one prior (refer to @sec-data_exploration). Notably, the same prior is applied to all $\sigma_{i}$. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this *robust* feature, and estimated one common variability for all children ($\sigma$ instead of $\sigma_{i}$), the same prior was utilized (see @sec-normal_GLLAMM and @sec-fitted).$$\begin{align}\sigma_{i} &\sim \text{Exponential}\left( 2 \right)\end{align} $$ {#eq-normal_GLLAM_sigma_prior}The left panel of @fig-sigma_prior_normalGLLAMM shows the weakly informative prior expects $\sigma_{i}$ to be possible only in a positive range and more likely between $[0, 1.5]$, as it is required for variability parameters [@Depaoli_2021]. However, the right panel of @fig-sigma_prior_normalGLLAMM shows that when the prior of $\sigma_{i}$ is transformed to the entropy scale, the model expect that certain scores fall beyond the feasible range of the outcome. This assumption may initially appear inappropriate. However, considering again the probabilistic representation of the model, other priors under the normal GLLAMM hide even more undesirable assumptions.```{r}#| label: fig-sigma_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, word-level entropy unexplained variability prior distribution: parameter and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rexp( n=n, rate=2 ) # <3>param_oscale =rnorm( n=n, mean=0, sd=param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,4), # <5>show.HPDI=0.95,main='Parameter scale', xlab=expression(sigma[i]))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. Normal GLLAMM word-level entropy variability prior distribution definition: parameter scale4. Normal GLLAMM word-level entropy variability prior distribution definition: entropy scale5. Normal GLLAMM word-level entropy variability prior distribution density plot: restricted parameter scale6. Normal GLLAMM word-level entropy variability distribution density plot: bounded entropy scale### Sample size $M_{i}$The prior distribution of $M_{i}$ for the beta-proportion GLLAMM is described in @eq-beta_GLLAM_m_prior. Similar to the previous parameter, for each of the children present in the data there is one prior (refer to @sec-data_exploration). Notably, the same prior is applied to all $M_{i}$. This choice implies that the parameters for the unexplained word-level entropy variability of each child are presumed to have similar uncertainties prior to the observation of empirical data. Furthermore, even when the model did not consider this *robust* feature and estimated one common variability for all children ($M$ instead of $M_{i}$), the same prior was utilized. (see @sec-beta_GLLAMM and @sec-fitted).$$\begin{align}M_{i} &\sim \text{Exponential}\left( 0.5 \right)\end{align} $$ {#eq-beta_GLLAM_m_prior}The left and right panel of @fig-m_prior_betaGLLAMM, demonstrate the prior of $M_{i}$ expects the parameters to be in a reasonable positive range between $[0, 5]$, and within the boundaries of the entropy scores. This implies that no particular bias is present in the word-level entropy unexplained variability, only that it is positive, as expected for measures of variability.```{r}#| label: fig-m_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, word-level entropy unexplained variability prior distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>param_pscale =rexp( n=n, rate=0.5 ) # <3>param_oscale =inv_logit( -1*param_pscale ) # <4>par(mfrow=c(1,2))dens( param_pscale, xlim=c(0,10), # <5>show.HPDI=0.95,main='Parameter scale', xlab=expression(M[i]))dens( param_oscale, xlim=c(0,1), # <6>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample sizedistribution definition3. Beta-proportion GLLAMM sample size variability prior distribution definition: parameter scale4. Beta-proportion GLLAMM sample size variability prior distribution definition: entropy scale5. Beta-proportion GLLAMM sample size variability prior distribution density plot: restricted parameter scale6. Beta-proportion GLLAMM sample size variability distribution density plot: bounded entropy scale<!-- The value we choose for the prior $\theta$ can be thought of this way: It is the number of new flips of the coin that we would need to make us teeter between the new data and the prior belief about $\mu$. If we would only need a few new flips to sway our beliefs, then our prior beliefs should be represented by a small $\theta$. If we would need a large number of new flips to sway us away from our prior beliefs about $\mu$, then our prior beliefs are worth a very large $\theta$ \cite{Kruschke_2015}. -->### Speech intelligibility $SI_{i}$After the careful assessment of the prior implications for each parameter, the expected prior distribution for the potential intelligibility can be constructed. The prior predictive simulation for $SI_{i}$ under the normal and beta-proportion GLLAMM can be described as in @eq-SI_prior: $$\begin{align}\alpha &\sim \text{Normal} \left( 0, 0.05 \right) \\\alpha_{HS[i]} &\sim \text{Normal} \left( 0, 0.3 \right) \\\beta_{A,HS[i]} &\sim \text{Normal} \left( 0, 0.1 \right) \\m &\sim \text{Normal} \left( 0, 0.05 \right) \\s &\sim \text{Exponential} \left( 2 \right) \\e_{i} &\sim \text{Normal} \left( m, s \right) \\u_{si} &\sim \text{Normal} \left( m, s \right) \\u_{i} &= \sum_{s=1}^{S} \frac{u_{si}}{S} \\SI_{si} &= \alpha + \alpha_{HS[i]} + \beta_{A, HS[i]} (A_{i} - \bar{A}) + e_{i} + u_{i} \\\end{align} $$ {#eq-SI_prior}The left panel of @fig-SI_prior_betaGLLAMM shows the prior restricts the potential intelligibility of children to be mostly between $[-6, 6]$, implying no particular bias towards positive or negative potential intelligibility is present in the priors. However, the right panel of @fig-SI_prior_normalGLLAMM, demonstrate that when the prior of $SI_{i}$ is transformed to the entropy scale under the normal GLLAMM, the model anticipates prediction of entropy scores beyond its feasible range.```{r}#| label: fig-SI_prior_normalGLLAMM#| fig-cap: 'Normal GLLAMM, potential intelligibility distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>a =rnorm( n=n, mean=0, sd=0.05 ) # <3>aHS =rnorm( n=n, mean=0, sd=3 ) # <4>bAHS =rnorm( n=n, mean=0, sd=0.1 ) # <5>m =rnorm( n=n, mean=0, sd=0.05 ) # <6>s =rexp(n=n, rate=2 ) # <7>e_i =rnorm( n=n, mean=m, sd=s ) # <8>u_i =replicate( n=n, exp=mean( rnorm( n=10, mean=m, sd=s ) ) ) # <9>param_pscale = a + aHS + bAHS + e_i + u_i # <10>param_oscale =-1*param_pscale # <11>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-7,7), # <12>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(SI[i]))dens( param_oscale, xlim=c(0,1), # <13>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. general intercept prior distribution definition4. hearing status effects prior distribution definition5. chronological age per hearing status effects prior distribution definition6. mean hyperparameter prior distribution definition7. standard deviation hyperparameter prior distribution definition8. child differences prior distribution definition9. sentence-child differences effects prior distribution definition10. Normal GLLAMM potential intelligibility prior distribution calculation: intelligibility scale11. Normal GLLAMM potential intelligibility prior distribution definition: entropy scale12. Normal GLLAMM potential intelligibility prior distribution density plot: unrestricted intelligibility scale13. Normal GLLAMM potential intelligibility prior distribution density plot: bounded entropy scaleIn contrast, the right panel of @fig-SI_prior_betaGLLAMM, demonstrates that when the prior of $SI_{i}$ is transformed to the entropy scale under the beta-proportion GLLAMM, the model anticipates a high concentration of data at the extreme levels of entropy, with a lower probability for scores around mid-levels. Furthermore, the model does not expect data beyond the feasible range of the outcome.```{r}#| label: fig-SI_prior_betaGLLAMM#| fig-cap: 'Beta-proportion GLLAMM, potential intelligibility distribution: intelligibility and entropy scale'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10require(rethinking) # <1>n =1000# <2>a =rnorm( n=n, mean=0, sd=0.05 ) # <3>aHS =rnorm( n=n, mean=0, sd=3 ) # <4>bAHS =rnorm( n=n, mean=0, sd=0.1 ) # <5>m =rnorm( n=n, mean=0, sd=0.05 ) # <6>s =rexp(n=n, rate=2 ) # <7>e_i =rnorm( n=n, mean=m, sd=s ) # <8>u_i =replicate( n=n, exp=mean( rnorm( n=10, mean=m, sd=s ) ) ) # <9>param_pscale = a + aHS + bAHS + e_i + u_i # <10>param_oscale =inv_logit( -1*param_pscale ) # <11>par(mfrow=c(1,2))dens( param_pscale, xlim=c(-7,7), # <12>show.HPDI=0.95,main='Intelligibility scale', xlab=expression(SI[si]))dens( param_oscale, xlim=c(0,1), # <13>show.HPDI=0.95,main='Entropy scale', xlab=expression(H[wsib]) )abline( v=c(0, 1), lty=2, col='gray')par(mfrow=c(1,1))```1. package requirement2. simulated sample size3. general intercept prior distribution definition4. hearing status effects prior distribution definition5. chronological age per hearing status effects prior distribution definition6. mean hyperparameter prior distribution definition7. standard deviation hyperparameter prior distribution definition8. child differences prior distribution definition9. sentence-child differences effects prior distribution definition10. Beta-proportion GLLAMM potential intelligibility prior distribution calculation: intelligibility scale11. Beta-proportion GLLAMM potential intelligibility prior distribution definition: entropy scale12. Beta-proportion GLLAMM potential intelligibility prior distribution density plot: unrestricted intelligibility scale13. Beta-proportion GLLAMM potential intelligibility prior distribution density plot: bounded entropy scale## Estimation {#sec-estimation}The models were estimated using the software *R* version 4.2.2 [@R_2015] and *STAN* version 2.26.1 [@Stan_2020]. R was utilized to handle the data and produce the outputs for the models. STAN was utilized to access the Bayesian estimation procedure. The procedure implemented in STAN was the *Hamiltonian Monte Carlo (HMC)*, specifically, the *No-U-Turn Sampler (NUTS)* [@Hoffman_et_al_2014] [^6]. [^6]: The reader can refer to Brooks et al. [-@Gelman_et_al_2011] for a detailed treatment on HMC methods.::: {.column-margin}**_Software_**:::Four Markov chains were implemented for each parameter. Each chain had distinct starting values. Due to the selected estimation procedure (HMC-NUTS), no burn-in or thinning procedure was necessary. Nevertheless, the procedure did required a warm-up phase. Consequently, each chain was run for $4,000$ iterations, where the first $2,000$ served as a warm-up phase and the remaining $2,000$ were considered *effective samples* from the posterior distribution.## Stationarity, convergence and mixing evaluation {#sec-stationarity}To evaluate the Markov chains' performance in terms of achieving stationarity, convergence, and good mixing, visual inspections of the trace, trace-rank, and autocorrelation plots (ACF) were conducted, as recommended by McElreath [-@McElreath_2020]. Furthermore, the assessment of convergence was also supported by the *potential scale reduction factor statistics* ($\hat{R}$), developed by Gelman and colleagues [-@Gelman_et_al_2014]. Convergence was considered to be achieved if $\hat{R} \leq 1.05$.## Posterior information evaluation {#sec-inf_posterior}To assess whether the parameter's posterior distribution has been generated with a sufficient number of uncorrelated sampling points, the posterior distribution density is inspected. The number of uncorrelated sampling points indicates the amount of information held by the posterior distribution. Additionally, this assessment is complemented by the *effective sample size statistics* ($n_{\text{eff}}$) developed by Gelman and colleagues [-@Gelman_et_al_2014].## Fitted models {#sec-fitted}Twelve statistical models were derived from the general descriptions of the normal and beta-proportion linear, latent and mixed models outlined in @sec-normal_GLLAMM and @sec-beta_GLLAMM. The models differed in the response likelihood component, link function, and the structural equation part. For a comprehensive overview of the parametrization for the entire set of model refer to @tbl-fitted.Of particular interest is the comparison of models one and seven. The selected model resulting from this comparison aims to address [Research question 1] and [Research question 2]. Furthermore, the comparison and selection among all competing models, with particular interest in models six, nine and twelve, collectively serve to address [Research question 3].+-------+--------------+---------+---------+----------------+-------------+-------------------+| | Response | Robust | Block | Fixed effects | | |+-------+--------------+---------+---------+----------------+-------------+-------------------+| Model | distribution | model | effects | $\beta_{HS[i]}$| $\beta_{A}$ | $\beta_{A,HS[i]}$ |+:=====:+:============:+:=======:+:=======:+:==============:+:===========:+:=================:+| 1 | Normal | No | Yes | No | No | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 2 | Normal | No | Yes | Yes | Yes | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 3 | Normal | No | Yes | Yes | No | Yes |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 4 | Normal | Yes | Yes | No | No | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 5 | Normal | Yes | Yes | Yes | Yes | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 6 | Normal | Yes | Yes | Yes | No | Yes |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 7 | Beta-prop. | No | Yes | No | No | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 8 | Beta-prop. | No | Yes | Yes | Yes | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 9 | Beta-prop. | No | Yes | Yes | No | Yes |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 10 | Beta-prop. | Yes | Yes | No | No | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 11 | Beta-prop. | Yes | Yes | Yes | Yes | No |+-------+--------------+---------+---------+----------------+-------------+-------------------+| 12 | Beta-prop. | Yes | Yes | Yes | No | Yes |+-------+--------------+---------+---------+----------------+-------------+-------------------+: Parametrization of fitted models. {#tbl-fitted .striped .hover}The following tabset panel shows the STAN code of all fitted model. The code show commentaries for each section of the model. Furthermore, the models are implemented using non-centered priors (refer to @sec-hyperpriors).::: {.panel-tabset}## model ## 1```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> s_w; // child, utterance, word sd (one overall) // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model01.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 2```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> s_w; // child, utterance, word sd (one overall) // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model02.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 3```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> s_w; // child, utterance, word sd (one overall) // vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ Hwsib[n] ~ normal( mu[n] , s_w ); // Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model03.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 4```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> s_w; // child, utterance, word sd (one overall) vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model04.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 5```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> s_w; // child, utterance, word sd (one overall) vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model05.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )```## 6```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> s_w; // child, utterance, word sd (one overall) vector<lower=0>[I] s_w; // child, utterance, word sd (one per child)}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = b_i[ bid[n] ] - SI[ cid[n] ]; }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( -0.5 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( -0.5 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( -0.5 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors s_w ~ exponential( 2 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ normal( mu[n] , s_w ); Hwsib[n] ~ normal( mu[n] , s_w[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w ); log_lik[n] = normal_lpdf( Hwsib[n] | mu[n] , s_w[ cid[n] ] ); }}"model_nam ="model06.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 7```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter // vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model07.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 8```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter // vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model08.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )```## 9```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters real<lower=0> Mw; // 'sample size' parameter // vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ Hwsib[n] ~ beta_proportion( mu[n] , Mw ); // Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model09.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 10```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept // vector[cHS] aHS; // HS effects // real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + // aHS[ HS[n] ] + // bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); // aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model10.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) )```## 11```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects real bAm; // Am effects // vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + bAm*Am[n] + // bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); bAm ~ normal( 0 , 0.1 ); // bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model11.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```## 12```{r}mcmc_code ="data{ // dimensions int N; // number of experimental runs int B; // max. number of blocks int I; // max. number of experimental units (children) int U; // max. number of sentences int W; // max. number of words // category numbers int cHS; // max. number of categories in HS // data array[N] int<lower=1, upper=B> bid; // block id array[N] int<lower=1, upper=I> cid; // child's id array[N] int<lower=1, upper=U> uid; // sentence's id array[N] int<lower=1, upper=cHS> HS; // hearing status array[N] real Am; // chron. age - min( chron. age ) array[N] real Hwsib; // replicated entropies}parameters{ // fixed effects parameters real a; // intercept vector[cHS] aHS; // HS effects // real bAm; // Am effects vector[cHS] bAmHS; // Am effects (per HS) // random effects parameters real m_b; // block RE mean real<lower=0> s_b; // block RE sd vector[B] z_b; // non-centered block RE real m_i; // child RE mean real<lower=0> s_i; // child RE sd vector[I] z_i; // non-centered child RE real m_u; // child, utterance RE mean real<lower=0> s_u; // child, utterance RE sd matrix[I,U] z_u; // non-centered child, utterance RE // variability parameters // real<lower=0> Mw; // 'sample size' parameter vector<lower=0>[I] Mw; // 'sample size' parameter}transformed parameters{ // to track vector[B] b_i; // block random effects vector[I] e_i; // child random effects matrix[I,U] u_si; // sentence random effects vector[I] u_i; // sentence average random effects vector[I] SI; // SI index vector[N] mu; // NO TRACK // random effects b_i = m_b + s_b*z_b; // non-centered block RE e_i = m_i + s_i*z_i; // non-centered child RE u_si = m_u + s_u*z_u; // non-centered utterance RE // intelligibility and average entropy for(i in 1:I){ u_i[ i ] = mean( u_si[ i, ] ); } for(n in 1:N){ SI[ cid[n] ] = a + aHS[ HS[n] ] + // bAm*Am[n] + bAmHS[ HS[n] ]*Am[n] + e_i[ cid[n] ] + u_i[ cid[n] ]; mu[n] = inv_logit( b_i[ bid[n] ] - SI[ cid[n] ] ); }}model{ // fixed effects priors a ~ normal( 0 , 0.05 ); aHS ~ normal( 0 , 0.3 ); // bAm ~ normal( 0 , 0.1 ); bAmHS ~ normal( 0 , 0.1 ); // random effects priors m_b ~ normal( 0 , 0.05 ); s_b ~ exponential( 2 ); z_b ~ std_normal(); m_i ~ normal( 0 , 0.05 ); s_i ~ exponential( 2 ); z_i ~ std_normal(); m_u ~ normal( 0 , 0.05 ); s_u ~ exponential( 2 ); to_vector( z_u ) ~ std_normal(); // variability priors Mw ~ exponential( 0.5 ); // likelihood for(n in 1:N){ // Hwsib[n] ~ beta_proportion( mu[n] , Mw ); Hwsib[n] ~ beta_proportion( mu[n] , Mw[ cid[n] ] ); }}generated quantities{ // track vector[N] log_lik; // log-likelihood for(n in 1:N){ // log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw ); log_lik[n] = beta_proportion_lpdf( Hwsib[n] | mu[n] , Mw[ cid[n] ] ); }}"model_nam ="model12.stan"writeLines(mcmc_code, con=file.path(save_dir, 'real_models', model_nam) ) ```:::Furthermore, the following code can be used to fit all the STAN models, changing the appropriate model name.::: {.panel-tabset}## code ## fitting```{r}model_nam ="modelXX.stan"model_in =file.path(getwd(), 'real_models')model_out =file.path(getwd(), 'real_chain')mod =cmdstan_model( file.path(model_in, model_nam) )print(model_nam)mod$sample( data=dlist,output_dir=model_out,output_basename =str_replace(model_nam, '.stan', ''),chains=4, parallel_chains=4,max_treedepth=20, adapt_delta=0.95)```:::## Model selection method and criteriaThe current research used the *Information-Theoretic Approach* [@Anderson_2008; @Chamberlain_1965] for model selection and inference. The *Information-Theoretic Approach* is composed of three steps: (1) the expression of the research hypothesis into statistical models, (2) the selection of the most plausible models, and (3) the production of inferences based on one or multiple selected models. The first step of the approach is developed in @sec-models, the second in @sec-fitted, and the third in @sec-results.::: {.column-margin}**_Information-Theoretic Approach_**Approach to model selection and inference composed of three steps: 1. Express research hypothesis into models, 2. Select most plausible models,3. Produce inferences based on selected models:::Related to second step, the selected criteria for choosing among competing models were the *widely applicable information criterion (WAIC)* [@Watanabe_2013] and the *pareto-smoothed importance sampling cross-validation criterion (PSIS)*, developed by Vehtari and colleagues [-@Vehtari_et_al_2021]. The use of *WAIC* and *PSIS* is justified based on two fundamental characteristics of the criteria. First, WAIC and PSIS incorporates all the information available in the posterior distribution of the parameters. This effectively integrates all the uncertainty inherent in the parameter estimates. Second, the criteria provides the most accurate approximation for the cross-validated deviance [@McElreath_2020], which serves as the closest estimate to the Kullback-Leibler divergence [@Kullback_et_al_1951]. The Kullback-Leibler divergence measures the degree to which a model accurately represents the actual distribution of the data. ::: {.column-margin}**_Reasons to use WAIC and PSIS_**The criteria:1. Incorporates all the uncertainty of the posterior distribution, and2. Evaluates the degree to which each model deviates from achieving *perfect predictive accuracy* for the data. :::Consequently, by comparing the criteria values across different models this study evaluates the degree to which each model deviates from achieving *perfect predictive accuracy* for the data [@McElreath_2020]. Specifically, models with lower WAIC or PSIS exhibit less deviation from *perfect predictive accuracy*, compared to alternative models. Additionally, this evaluation provides an indication of the level of uncertainty associated with the findings.# Results {#sec-results}## Bayesian inference results {#sec-bayesian_results}In this section, the results of the Bayesian inference procedures are presented. Model $6$ and $12$ are selected as representative, as they have the largest number of parameters among all the models (refer to @sec-fitted). The selected models are then utilized to illustrate the quality of the Bayesian *estimates*, in terms of stationarity, convergence, mixing, and the information available in the posterior distribution of the parameters. It is crucial to emphasize that the authors conducted a meticulous inspection of all the fitted models. All models achieved similar results to those selected for illustration.The following code loads the model information. The function `file_id()` is a user-defined function that identifies, within a particular directory, the files generated by STAN as a result of the fitting process.::: {.panel-tabset}## code ## loading```{r}# load reference modelsmodel_nam ="modelXX"model_out =file.path( save_dir, 'real_chain')model_fit =file_id( model_out, model_nam ) modelXX = rstan::read_stan_csv( file.path( model_out, model_fit ) )```:::```{r}#| label: code-usd#| fig-cap: ''#| echo: false#| warning: false# load functionssave_dir ='/home/josema/Desktop/1. Work/1 research/PhD Antwerp/#thesis/paper1'source( file.path( save_dir, 'real_code', '0_sim_extra.R') )``````{r}#| label: code-models_load#| fig-cap: ''#| echo: false#| warning: false# load reference modelsfor(i in1:12){ model_nam =paste0( ifelse(i<10, 'model0', 'model'), i) model_out =file.path( save_dir, 'real_chain') model_fit =file_id( model_out, model_nam ) assign( model_nam, rstan::read_stan_csv( file.path( model_out, model_fit ) ) )}```Furthermore, the following code serves to display a concise information of the parameter estimates for the selected models.```{r}#| label: code-recovery_models#| fig-cap: 'Stationarity, convergence and mixing'#| echo: true#| warning: falseparameters =c( 'a','aHS','bAm','bAmHS', # <1>'m_b','s_b','m_i','s_i','m_u','s_u','s_w','Mw')model06_parameters =parameter_recovery( # <2>stan_object = model06,est_par = parameters,p =0.95 )model06_parameters = model06_parameters[-c(6:7),]model12_parameters =parameter_recovery( # <3>stan_object = model12,est_par = parameters,p =0.95 )model12_parameters = model12_parameters[-c(6:7),]```1. parameters of interest2. user-defined function: displays concise parameter estimate information for selected model3. user-defined function: displays concise parameter estimate information for selected model### Stationarity, convergence and mixing {#sec-stationarity_results}@fig-stationarity_plot1 through @fig-stationarity_plot8 show the trace, trace-rank, and ACF plots for selected parameters of the chosen models. First, the left panels of the figures display the trace plots. These plots illustrate that for each parameter, the chains' post-warm-up iterations visually converge to a mean value with a constant variance. Second, the middle panel of the figures shows the trace-rank plots. These panels reveal that each parameter chains explored their respective parameter space in a seemingly random manner. Third, the right panel of the figures shows the ACF plots. These panels demonstrate that each parameter chains have relatively low autocorrelations.Furthermore, @fig-Rhats shows the $\hat{R}$ statistic for each parameter of the selected model. The figure reveals that the statistics supports the assessment of convergence in the chains' post-warm-up iterations for each parameter, as no parameter $\hat{R}$ exceeds the threshold of $1.05$.```{r}#| label: fig-stationarity_plot1#| fig-cap: 'Model 06, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('a','aHS[1]','aHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot2#| fig-cap: 'Model 12, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('bAmHS[1]','bAmHS[2]') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot3#| fig-cap: 'Model 06, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('m_b','s_b') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot4#| fig-cap: 'Model 1, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=c('m_i','s_i') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot5#| fig-cap: 'Model 06, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=c('m_u','s_u') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot6#| fig-cap: 'Model 12, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=paste0('Mw[', 1:5,']') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot7#| fig-cap: 'Model 06, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model06, # <1>pars=paste0('s_w[', 1:5,']') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-stationarity_plot8#| fig-cap: 'Model 12, trace, trace rank and ACF plots of selected parameters'#| echo: true#| warning: false#| fig-height: 11#| fig-width: 10tri_plot( stan_object=model12, # <1>pars=paste0('SI[', 1:5, ']') )```1. used-defined function: generation of trace, trace rank, and ACF plots for selected parameters within a model```{r}#| label: fig-Rhats#| fig-cap: 'Selected models, Rhat values'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( 1:nrow(model06_parameters), model06_parameters$Rhat4, # <1>ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Rhat', main='Normal GLLAMM: model 06')axis( side=1, at=1:nrow(model06_parameters), # <2>labels=rownames(model06_parameters),cex.axis=0.8, las=2 )abline( h=1.05, lty=2 ) # <3>plot( 1:nrow(model12_parameters), model12_parameters$Rhat4, # <4>ylim=c(0.95, 1.1), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='Rhat', main='Beta-proportion GLLAMM: model 12')axis( side=1, at=1:nrow(model12_parameters), # <5> labels=rownames(model12_parameters),cex.axis=0.8, las=2 )abline( h=1.05, lty=2 ) # <6>par( mfrow=c(1,1) )```1. model 06: Rhat values plot2. model 06: parameters names in x-axis3. model 06: convergence threshold4. model 12: Rhat values plot5. model 12: parameters names in x-axis6. model 12: convergence threshold### Posterior information {#sec-posterior_results}@fig-histogram1 through @fig-histogram5 show the density plots for selected parameters of the chosen models. The figures reveal that the parameters' posterior distributions have enough information. This implies the posterior distributions have been generated with sufficient uncorrelated sampling points. Furthermore, @fig-neff shows the plots for *effective sample size statistics* ($n_{\text{eff}}$) for each parameter. The left and right panels of the figure demonstrate that most of the parameters have $n_{\text{eff}} > 2000$, with a few others having $n_{\text{eff}} < 2000$ post-warm-up iterations set by the Bayesian procedure (refer to @sec-estimation). In aggregate, this provides supporting evidence that the parameters' posterior distributions have sufficient information about the parameters, and they make substantive sense compared to the models' prior beliefs.On the other hand, the figures also reveal that all the posteriors are unimodal distributions with values centered around a mean. This provides additional evidence of stationarity and convergence of the distributions, as described in @sec-stationarity_results.```{r}#| label: fig-histogram1#| fig-cap: 'Model 06, density plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10pars =c('a','aHS[1]','aHS[2]','bAmHS[1]','bAmHS[2]') # <1>par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-histogram2#| fig-cap: 'Model 12, density plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10pars =c('m_b','m_i','m_u','s_b','s_i','s_u') # <1>par( mfrow=c(2,3) )dens_plot( stan_object=model12, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model12, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-histogram3#| fig-cap: 'Model 06, density plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10pars =paste0('s_w[',1:6,']') # <1>par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-histogram4#| fig-cap: 'Model 12, density plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10pars =paste0('Mw[',1:6,']') # <1>par( mfrow=c(2,3) )dens_plot( stan_object=model12, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model12, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-histogram5#| fig-cap: 'Model 06, density plots of selected parameters'#| echo: true#| warning: false#| fig-height: 6#| fig-width: 10pars =paste0('SI[',1:6,']') # <1>par( mfrow=c(2,3) )dens_plot( stan_object=model06, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model06, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-neff#| fig-cap: 'Selected models, Neff values'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10par( mfrow=c(1,2) )plot( 1:nrow(model06_parameters), model06_parameters$n_eff, # <1>ylim=c(0,10000), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='neff', main='Normal GLLAMM: model 06')axis( side=1, at=1:nrow(model06_parameters), # <2>labels=rownames(model06_parameters),cex.axis=0.8, las=2 )abline( h=c(0, 1000, 2000), lty=2 ) # <3>plot( 1:nrow(model12_parameters), model12_parameters$n_eff, # <4>ylim=c(0,10000), pch=19, col=rgb(0,0,0,alpha=0.3),xaxt='n',xlab='', ylab='neff', main='Beta-proportion GLLAMM: model 12')axis( side=1, at=1:nrow(model12_parameters), # <5> labels=rownames(model12_parameters),cex.axis=0.8, las=2 )abline( h=c(0, 1000, 2000), lty=2 ) # <6>par( mfrow=c(1,1) )```1. model 06: neff values plot2. model 06: parameters names in x-axis3. model 06: convergence threshold4. model 12: neff values plot5. model 12: parameters names in x-axis6. model 12: convergence threshold## Research question 1 {#sec-RQ1_results}talk about: 1. selection of models2. a comparison of predictions 3. the variability- WAIC and PSIS indicate that beta model 7 is way better. Why, because the predictions now fall within the limits of the data (the other model shows a clear sign of underfitting )- Most variability is at the word-level. If this is not controlled you would be overestimating the precision of the tests. Because each observation doesn't count as one effective piece of info, but less than that because they are correlated.- the unexplained variability of children and sentences follow in terms of the amount. This just describe how much variability is expected because of each thing.- the variability of block effects indicate that there might be some presence of lack of inter-listener reliability. If this was not controller this lack of reliability might creep in the hypothesis test of the parameters```{r}#| label: code-RQ1_WAIC#| fig-cap: ''#| echo: true#| warning: falserequire(rethinking) # <1>set.seed(12345) # <2>RQ1_WAIC =compare( func=WAIC, # <3> model01, model04, model07, model10 )round( RQ1_WAIC, 3 ) # <4>```1. package requirement2. seed for replication3. comparison of selected models with WAIC4. reporting with three decimal points```{r}#| label: code-RQ1_PSIS#| fig-cap: ''#| echo: true#| warning: falserequire(rethinking) # <1>set.seed(12345) # <2>RQ1_PSIS =compare( func=PSIS, # <3> model01, model04, model07, model10 )round( RQ1_PSIS, 3 ) # <4>```1. package requirement2. seed for replication3. comparison of selected models with PSIS4. reporting with three decimal points```{r}#| label: fig-pred_mode01#| fig-cap: 'Model 01, entropy score predictions'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10data_pred( true_data=data_H, # <1>stan_object=model01,normal=T, p=0.95, y_lim=c(-0.5, 1.1) )```1. user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale```{r}#| label: fig-pred_mode04#| fig-cap: 'Model 04, entropy score predictions'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10data_pred( true_data=data_H, # <1>stan_object=model04,normal=T, p=0.95, y_lim=c(-0.5, 1.1) )```1. user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale```{r}#| label: fig-pred_mode07#| fig-cap: 'Model 07, entropy score predictions'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10data_pred( true_data=data_H, # <1>stan_object=model07,normal=F, p=0.95, y_lim=c(-0.5, 1.1) )```1. user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale```{r}#| label: fig-pred_mode10#| fig-cap: 'Model 10, entropy score predictions'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10data_pred( true_data=data_H, # <1>stan_object=model10,normal=F, p=0.95, y_lim=c(-0.5, 1.1) )```1. user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale```{r}#| label: fig-var_model07_1#| fig-cap: 'Model 07, variability parameters density plots'#| echo: true#| warning: false#| fig-height: 5.5#| fig-width: 10pars =c('s_b','s_i','s_u','Mw') # <1>par( mfrow=c(2,2) )dens_plot( stan_object=model07, p=0.95, # <2>pars=pars[1], x_lab=pars[1] )for( i in2:length(pars) ){dens_plot( stan_object=model07, p=0.95,pars=pars[i], x_lab=pars[i] ) }par( mfrow=c(1,1) )```1. parameters of interest2. used-defined function: generation of density plot with HPDI confidence intervals for selected parameters```{r}#| label: fig-vars_model07_2#| fig-cap: 'Model 07, variability parameters density plots at the entropy scale '#| echo: true#| warning: false#| fig-height: 5.5#| fig-width: 10dens_var( stan_object=model07, p=0.95, # <1>bp=0.5, bM=3,pars=c('s_b','s_i','s_u','Mw')) ```1. user defined function: plots the transformation of the variability parameters into the entropy scale. ## Research question 2 {#sec-RQ2_results}talk about:1. the capacity to create a potential intelligibility score2. notice the ordering is not the same as the previous plot for model 07 (makes sense)3. epistemiological concerns- We can construct potential intelligibility.- This allows us to offer a ranking of individuals, and even make comparison among them assessing also the certainty of the comparison```{r}#| label: code-SI_model10_extra#| fig-cap: ''#| echo: true#| warning: falseparameters ='SI'# <1>model10_SI =parameter_recovery( # <2>stan_object = model07,est_par = parameters,p =0.95 )idx =order( model10_SI$mean, decreasing=F) # <3>model10_SI = model10_SI[idx, ]model10_SI```1. parameter of interest2. user-defined function: displays concise parameter estimate information for selected model3. sorting an presenting ordered potential intelligibility```{r}#| label: fig-SI_model10#| fig-cap: 'Model 10, potential intelligibility per child'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10col_func =colorRampPalette(c("red", "blue")) # <1>col_plot =col_func( nrow(model10_SI) )SI_plot( precis_object=model10_SI, # <2> col_string=col_plot )```1. color generation2. user defined function: plot ordered potential intelligibility## Research question 3 {#sec-RQ3_results}talk about:1. model selection2. parameter interpretations3. parameter contrasts4. potential intelligibility estimation5. model predictions6. outlier identification- the evidence is not unequivocal between model 10, 11, and 12.- model 10 to estimate SI without other knowledge- model 12 indicates that HS start at the same level as HI/CI, but they evolve differently- no evaluation if the evolution achieves a valley with more chronological age- interesting model 10 is similar to model 07, but with one extra assumption: there is some specific word-level variability for each child.```{r}#| label: code-RQ3_WAIC#| fig-cap: ''#| echo: true#| warning: falserequire(rethinking) # <1>set.seed(12345) # <2>RQ3_WAIC =compare( func=WAIC, # <3> model01, model02, model03, model04, model05, model06, model07, model08, model09, model10, model11, model12 )round( RQ3_WAIC, 3 ) # <4>```1. package requirement2. seed for replication3. comparison of selected models with WAIC4. reporting with three decimal points```{r}#| label: code-RQ3_PSIS#| fig-cap: ''#| echo: true#| warning: falserequire(rethinking) # <1>set.seed(12345) # <2>RQ3_PSIS =compare( func=PSIS, # <3> model01, model02, model03, model04, model05, model06, model07, model08, model09, model10, model11, model12 )round( RQ3_PSIS, 3 ) # <4>```1. package requirement2. seed for replication3. comparison of selected models with PSIS4. reporting with three decimal points```{r}#| label: code-recovery_model12#| fig-cap: ''#| echo: true#| warning: falseparameters =c( 'a','aHS','bAm','bAmHS', # <1>'m_b','s_b','m_i','s_i','m_u','s_u','Mw')model12_parameters =parameter_recovery( # <2>stan_object = model12,est_par = parameters,p =0.95 )model12_parameters = model12_parameters[-c(6:7),]model12_parameters[1:5,] # <3>```1. parameters of interest2. user-defined function: displays concise parameter estimate information for selected model3. reporting```{r}#| label: code-contrast_model12#| fig-cap: ''#| echo: true#| warning: falserequire(coda) # <1>post =extract.samples( model12 ) # <2>cont_post =data.frame( post$aHS[,2] - post$aHS[,1], # <3> post$bAmHS[,2] - post$bAmHS[,1] )colnames(cont_post) =c( 'aHS[2] - aHS[1]','bAmHS[2] - bAmHS[1]' )hpdi_res =HPDinterval( as.mcmc(cont_post), prob=0.95 ) # <4>attr(hpdi_res, 'dimnames')[[2]] =c('HPDI_lower','HPDI_upper') cont_res =precis( cont_post, depth=2, hist=F, prob=0.95 ) # <5>names(cont_res)[3:4] =c('CI_lower','CI_upper')cont_res =cbind(cont_res, hpdi_res) # <6>round( cont_res, 3 ) ```1. package requirement2. posterior distribution samples for selected parameters3. contrasts calculation4. 95% Highest Probability Density Interval (HPDI) for contrasts5. concise estimate information for contrasts6. reporting with three decimal points```{r}#| label: code-SI_model12#| fig-cap: ''#| echo: true#| warning: falseparameters ='SI'# <1>model12_SI =parameter_recovery( # <2>stan_object = model12,est_par = parameters,p =0.95 )nam =rownames(model12_SI)d_extra =unique( data.frame( dlist[ c('cid','HS','Am') ] ) ) # <3>model12_SI =cbind(d_extra, model12_SI)rownames(model12_SI) = namidx =with( model12_SI, # <4>order( HS, mean, decreasing=F ) )model12_SI = model12_SI[idx, ]model12_SI[,-c(1,11)]```1. parameter selection2. user-defined function: displays concise parameter estimate information for selected model3. extracting and concatenating estimates with hearing status and chronological age data4. selecting, ordering and reporting data based on hearing status categories and chronological age```{r}#| label: fig-SI_model12_0#| fig-cap: 'Model 12, ordered potential intelligibility per child'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10col_func =colorRampPalette(c("red", "blue")) # <1>col_plot =col_func( nrow(model12_SI) )idx =with( model12_SI, # <2>order( mean, decreasing=F ) )model12_SI = model12_SI[idx, ]SI_plot( precis_object=model12_SI, # <3> col_string=col_plot )```1. color generation2. ordering data in terms of potential intelligibility3. user defined function: plot ordered potential intelligibility```{r}#| label: fig-SI_model12_1#| fig-cap: 'Model 12, ordered potential intelligibility per child and hearing status'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10col_plot =rep(rethink_palette[1:2], each=16) # <1>idx =with( model12_SI, # <2>order( HS, mean, decreasing=F ) )model12_SI = model12_SI[idx, ]SI_plot( precis_object=model12_SI, # <3> col_string=col_plot )abline( v=16.5, lty=2 )legend( 'topleft', c('NH','HI/CI'), fill=rethink_palette[1:2], bty='n' )```1. color generation2. ordering data in terms of potential intelligibility3. user defined function: plot ordered potential intelligibility```{r}#| label: fig-SI_model12_2#| fig-cap: 'Model 12, potential intelligibility per hearing status group and chronological age'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10idx =with( model12_SI, # <1>order( HS, Am, mean, decreasing=F ) )model12_SI = model12_SI[idx, ]plot_col = rethink_palette[1:2] # <2> model_pred( precis_object=model12_SI, # <3>parameter_object=model12_parameters,col_string=plot_col )```1. sorting data with hearing status and chronological age2. defining colors for plot3. user defined function: plot of potential intelligibility on chronological age, per hearing status group```{r}#| label: fig-pred_mode12#| fig-cap: 'Model 12, entropy score predictions'#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10data_pred( true_data=data_H, # <1>stan_object=model12,normal=F, p=0.95, y_lim=c(-0.1, 1.1) )```1. user defined function: plot entropy scores per child with transformation of potential intelligibility into entropy scale```{r}#| label: code-RQ3_outliers#| fig-cap: ''#| echo: true#| warning: falserequire(rethinking) # <1>set.seed(12345) # <2>model_WAIC =WAIC( model12, pointwise=T ) # <3>model_PSIS =PSIS( model12, pointwise=T ) # <4>```1. package requirement2. seed for replication3. pointwise WAIC estimates4. pointwise PSIS estimates```{r}#| label: fig-RQ3_outliers#| fig-cap: 'Outlier identification, '#| echo: true#| warning: false#| fig-height: 4#| fig-width: 10plot( model_PSIS$k, model_WAIC$penalty, # <1>xlab='PSIS k values', ylab='WAIC penalty',pch=19, col=rgb(0,0,0,alpha=0.2) )abline( v=0.5, lty=2)idx =which(model_PSIS$k>0.5) # <2>for(i in idx ){text( x=model_PSIS$k[i], y=model_WAIC$penalty[i], paste0('(', dlist$uid[i], ',', dlist$cid[i], ')' ) ) }```1. plot of WAIC penalty and PSIS k values2. identification of influential observations (sentences, individuals) # Journal (to erase)Things to consider1. it is important to address the issue of statistical power. (Need to be addressed)2. Multiple NHST tests inflate null-hypothesis rejection rates. (NO need to address because of Bayesian)a. p-hacking (NO issues)b. dropping observation (NO issues)c. covariate analysis (NO issues, mention they are exploratory)3. Rich descriptions of the data help reviewers, the Editor, and other readers understand your findings. (already done)4. Cherry picking experiments, conditions, DVs, or observations can be misleading. (NO issues)5. Be careful about using null results to infer "boundary conditions" for an effect. (NO issues)6. Authors should use statistical methods that best describe and convey the properties of their data. (already done)Follow [Behavior research methods](https://www.springer.com/journal/13428/submission-guidelines#Instructions%20for%20Authors_Behavior%20Research%20Methods%20General%20Information) for more information.# References::: {#refs}:::